GB2425194A

GB2425194A - Tracking user network activity using a client identifier

Info

Publication number: GB2425194A
Application number: GB0507670A
Authority: GB
Inventors: Karl Bunyan; Darren Beale
Original assignee: EXPONETIC Ltd
Current assignee: EXPONETIC Ltd
Priority date: 2005-04-15
Filing date: 2005-04-15
Publication date: 2006-10-18
Also published as: GB0507670D0

Abstract

User activity on a network is tracked by appending client identification data to hyperlinks (S702) which may be accessed, or to a window name of a browser. Code may be provided to find a client identifier, or if none is found to generate one. Data transmitted to a server on the network (S704) includes the identifier, thus user navigation may be logged and assessed. Further activity such as mouse clicks, data entry and system parameters may also be recorded. Data may be stored client-side in a cookie and supplied to a requesting server. A requesting server may be configured as a proxy server to a third party server (S701, S705) to enable user-tracking across different domains. Page sequence identifiers may be used to determine user actions e.g. Back/Forward, and user activity data may be deleted from a client once received at a server.

Description

A Method and Apparatus for Tracking Activity by a User's Computer on a

Data Network The present invention relates to an apparatus and method for tracking activity from a user's computer on a data network, and in particular, for tracking data requests from a user's computer to a network server such as an internet server or local area network (LAN) server. One aspect of the invention relates to methods and apparatus for generating and storing tracking information, and another aspect of the invention relates to analysis of the tracking data received from the user's computer.

Many website managers and owners are interested in monitoring how website visitors navigate through their website. For example, many internet commerce website owners are interested in knowing the type of navigation methods a customer employs in order to view, select and purchase products from the website. Tracking information on customer browsing activity can shed light on whether customers are looking in the wrong place for particular links on the website, or are giving up and leaving the website part of the way through the process of purchasing a product. In addition, tracking information can provide information on server load, and number of unsuccessful requests for files, indicating whether the server capacity is sufficient to meet demands.

l'he website owner can use the tracking information to improve the layout and design of the wcbsite, making it more customer-friendly, and can monitor the improved website to see if any problems have been resolved.

Several methods of tracking a user's web browsing activity are already well known.

One of the simplest tracking methods is log file analysis. All common web server software generates log files, which record every request by a user for a file on the web site. Even though the format of these log files may vary, there are standards which are generally adhered to. Requests for webpages, images and other files are all logged. Each request is logged by an IP address which, generally, represents a single user. Log file analysis involves analysing these recorded log files, to try and reconstruct the patterns of user activity on the website. No alteration needs to be made to the web page code to perform user tracking by log file analysis, because log files are generally produced automatically. There is no installation required to start analysing the usage, and historical reporting is also possible. As long as a server has been producing log files then these can be analysed for dates prior to the setup of any analysis software.

US2003/0 130982 describes a data collection system that collects statistics related to each visit to each web page within a website. The systems collects information such as the visitor's connection speed, time required to download the page, duration of time that the visitor spent at the page, whether the back button was used to leave the site, etc. However, log file analysis has the disadvantage that tracking by IP address gives only a minimal amount of information about the user's activities, and it can also be unreliable.

For example, if the site visitor is behind a corporate firewall, then many users may be wrongly identified as being a single individual. If the user is accessing the internet through a pooi of proxy servers, such as in many dial-up internet connections, a single user may appear as multiple visitors. Even in the case where a website visitor has one unique and consistent IP address, there is still the problem that not all page hits are registered in the log file. For example, pages cached at any point in the network between the server and the browser (e.g. on a proxy server or a local cache) do not register as a page view in the log file. Also, spiders and other automated browsers may be undistinguishable from normal individuals.

One attempt at circumventing these problems involves the use of cookies. Cookies are small pieces of data that are stored on a visitor's computer, and so can be used to uniquely identify a visitor even if the IP address changes. Cookies can be stored on a user's computer for a long period of time, making it possible to identify a visitor on repeated visits to a site.

First party cookies are cookies which are served from the same web server as the originating web page, and so cannot be distinguished by origin from other website content. First party cookies are either generated by a call to the site's web server or are generated through on-page javascript (which may be served from any location). The use of first party cookies is well known, for example, for keeping track of a customer's "shopping basket" by an electronic commerce web site. The use of first party cookies is also known for tracking a user's activity on a website. W002/44869 describes a system and method for generating and reporting cookie values at a website visitor's computer using cookie processing script embedded within the downloaded web page. Data mining code within the downloaded web page is operable on the visitor computer.

Third party cookies are cookies which are served from a web server that is different from the location of the originating web page. Third party cookies can be despatched from the third party web server by a call to any file on that web server. Third party cookies can thus be registered to an independent tracking service, instead of to the web server of the requested web page. This allows visitors to be traced across multiple servers, e.g. in order to provide market-wide statistics. Since third party cookies do not rely on JavaScript, they do not require any scripting abilities on the user's browser, and they will work on browsers that have JavaScript disabled.

U56393479 and US6766370 describe a method of internet website traffic flow analysis, using third party cookies. For every website page requested by a website visitor, the state of the visitor's browser is recorded and data relating to the path visitors take through the website is collected and studied. The state of the visitor's browser is maintained in a traffic analysis cookie. The data in the cookie can follow the visitor browser through independent file servers.

In U S2004/0054784, a user requests a web page from a server, and program code within the requested web page is run on the user's computer to generate a unique identifier corresponding to the web page. The unique identifier includes a unique value and a time and data stamp. A session cookie identifies a particular web user session. By tracking user web sessions in this manner, web page data transmitted during a particular web user session can be efficiently and accurately correlated for analysis.

U S2004/O 128534 describes a method for tracking, correlating and analyzing a visitor's email and website access and behaviour. Tracking enabled e-mails and web pages with embedded communication software are used to capture and store a visitor's email address in a cookie. A unique identifier is added to the cookie, and the cookie is embedded in the HTML rendering component of the visitor's email application and web browser, so that information from the visitor's access to email and access to a website and behaviour there can be stored and used between the two applications and analysed by proprietary software.

A problem with the use of cookies is that most modern web-browsers have privacy controls to allow a user to disable the use of cookies altogether, or to accept them on a site-by-site basis. This renders tracking by cookies impossible. Many users set third party cookies to be blocked by default. Third party cookies without a compact privacy policy are blocked by the default settings of many browsers. This means that any system which relies on these cookies that is not properly configured will receive very little tracking data. Although users block third party cookies quite regularly, they are less likely to block first party cookies as they are often integral to a site's functionality.

However, first party cookies used for the purposes of tracking are generally generated by JavaScript code and so will not register on systems with JavaScript disabled. This may result in incorrectly counting users and will certainly prevent accurate tracking of user navigation.

Even if cookies are accepted and stored, they are easily deleted by experienced users.

Also, anti-spyware packages are becoming more widespread, and these block or delete tracking cookies regularly. This may render a session untrackable or make a single visitor appear as two distinct visitors.

Another common technique is the use of an invisible html image for web tracking. The image is usually a transparent gif of I pixel square dimensions, which is placed in every web page that it is desired to track. The source of the image is a script, usually located on the tracking company's server, which gathers information about the page the image is on once it is loaded. The advantage of the invisible image technique is that it works on all browsers that can display images. However, it has the disadvantages that it is possible to block images from specific locations which would result in no tracking information being registered for users with this configuration. Some web browsers provide an option to block images which are not from the original web site, which is the case with the majority of tracking images. Also, images may be cached and so not all page views are registered. As the user identification is still by IP address, it has the same problems as log file analysis in the inability to identify all unique users. Work is required in adding code to each page, on the part of whoever runs the site that needs to be tracked. This may be straightforward or may require the manual editing of a large number of pages.

Cookie tracking is often used in conjunction with invisible images. This allows the same user to be identified as they move from one page to the next.

A variation on the html invisible image method described above is to use JavaScript images. instead of the image tag being generated through static html, it is generated through JavaScript. Most modern methods of tracking use JavaScript as the principle tracking method, with a <noscript> tag containing a static html invisible image for non- JavaScript browsers. The Javascript code passes a range of tracking parameters through to the tracking server, together with the I-ITTP request. Either a third party cookie is returned by the tracking server or a first party cookie is placed by JavaScript and is used to identify the user as they move from page to page. Commonly, a static HTML image is located in a web page as a backup for tracking, in case JavaScript is not enabled.

The use of JavaScript allows additional information to be gathered, since JavaScript has access to a wider range of information, such as screen size and whether cookies are enabled. Although basic information such as IP address, and the http-user-agent (i.e. the type of browser, version, platform) can be provided without the use of Javascript, additional information such as screen resolution, session ID, previous webpage (referrer), visitor ID (which is a cookie based long term user identifier) , screen width, height and number of colours, and whether cookies are supported can be made available from the user's computer using Javascript.

Javascript also allows user events and interactions to be recorded, such as cursor position, mouse clicks, mouseout (i.e. if the mouse leaves the window), buttons pressed, text entered into fields, start of page load, completion of page load, completion of image or frame load, users input of data starts, user input of data finishes, user data, hot spot and hyperlink roll-overs, hyperlink selection, mouse position, field selections, browser toolbar usage, next URL (Uniform Resource Locator) selected, user editing field information, user identification (log on name, device, IP address, etc). The javascript code can store a running count of when the event occurred, in milliseconds from when the page was loaded, and each recorded activity can be given a time stamp to allow a picture of the user's browsing session to be reconstructed. Javascript document write allows new elements to be written into a web page.

JavaScript allows cache-busting code to be introduced, to ensure page views are registered even when the user is viewing pages through a caching system ensuring that data is more accurate. For example, JavaScript may be used to add a random number to the end of a URL, so that the URL appears as though it does not already exist in the cache. This causes the URL to be requested from its original location rather than from the cache, thus allowing the server to register a "hit".

GB2357679 describes a JavaScript monitoring applet for a web browser, which records user interactions and transmits these to a remote monitoring server. The applet may be JavaScript code within a web page, to be loaded and deleted as the page is visited and left. The JavaScript code is included into the web page by the author or maintainer of the page. It may include a registration key, to verify that the web page and remote monitoring server correspond.

US61 12240 describes a web site client information tracker. A tracker tag is used in the code of a web page, for initiating a client information tracking program, which may be on the client or remote. The tracking program is initiated by a tracker message transmitted from a web browser on the client computer. The tracking program may obtain and store client information. The apparatus for obtaining client information may include a mechanism for intercepting a request from the browser to display a previously downloaded web page (e.g. the "back" button being pressed), and a mechanism for controlling the client to notify the tracking program that the web page is displayed on the client.

However, it is relatively simple on most browsers to disable JavaScript, and this will prevent any information being generated. As a back-up, an invisible image is usually included as part of the html in a <noscript> tag. Again, code must be added to each page, requiring work on the part of whoever runs the site that needs to be tracked. This may be straightforward or may require the manual editing of a large number of pages.

Another prior art tracking technique uses Java applets. Java applets are small applications (written in Java) that run inside a web browser. Java is a fully-fledged application programming language that is usually installed as an add-on to an operating system through a virtual machine that must be instantiated in order to run java code.

Java applets are able to maintain state independent of cookies. Java code is cached, so once ajava applet has been downloaded, it is generally not required to download it again in the same session. Flowever, not all browsers have support for java applets installed, making all tracking in such browsers by this method impossible. Also, Java can be disabled just as easily as JavaScript resulting in an inability to track users or even count page views.

A further prior art method uses Flash "pseudo-cookies" instead of standard web browser cookies, using Flash's local shared objects (LSO). However, this method is limited to Flash enabled web browsers, and will not work if a user disables LSO in their flash installation. Common web browsers will also block LSO if the privacy settings are set to their highest level. Like cookies, Flash local shared objects can also be easily deleted either by users or by software applications designed to protect user's privacy (anti- spyware applications) and are likely to be targeted by these applications in the future.

A further method, network traffic analysis, is also known to be used for web tracking, but is much less common than the other prior art techniques discussed. Network traffic analysis involves monitoring the flow of packets on a network, to determine what is happening on the network. However, it can be difficult and complicated to filter out the relevant data packets from a large number of data packets relating to other applications and purposes.

Another method of tracking is by the use of URL session id's. URL session id's are variables that are generated and appended to the URL of each hyperlink in a web page.

They do not affect the file or network location indicated by the hyperlink, but simply act as a label, passing additional information to the server along with the file request. By using, for example, a unique combination of numbers and digits as a URL session id, a user can be identified to a server by virtue of this session id, which is shown in the browser's address bar for each web page. Session id's can correspond as a substitute for using cookies. Some web servers or scripting solutions automatically generate session id's where cookies are unavailable. This method of encoding session variables and appending them to a URL is also known as URL munging. Server-side generation of session id's is often used in c-commerce engines, but is not known to be used in any web analytics application..

No cookies are required for tracking by URL session id, making it less sensitive to the browser preferences or to a user deleting cookies during a session. Session id's are visible and can be copied, pasted and bookmarked. Thus, two people may appear to have the same session ID if one e-mails a link to the other, unless logic is specifically put in place to prevent this. A session id cannot identify a user returning to a site at a later date, unless the session id is stored by the user, e.g. in the user's bookmarks.

U52002/0 178186 discloses a method of URL munging to seamlessly integrate third party content (e.g. a search engine) into a web page. US2003/0 105807 describes the use of URL munging for transferring data between websites.

US6710786 describes a method and apparatus for incorporating state information into a URL, in order to maintain a stateless server. The server incorporates state information (e.g. client ID, operation ID and status) into a URL which is sent to the client as a link in a webpage. When the client requests the URL by selecting the link, the client sends the state information back to the server.

An object of the present invention is to provide improved tracking of activity of users computers on a network.

A first aspect of the present invention provides a method and apparatus for tracking data requests from a user's computer to a web server. Code representing a web page is provided to the user's computer, the code including executable instructions to the user's web browser to modify hyperlinks in the web page by adding an identification label to each said hyperlink. For example, the identification label may be a text string including a unique sequence of characters.

The user normally selects hyperlinks within the web page to navigate within the same website. When the user navigates away from the current web page, the user may thus select a hyperlink from the web page, thus causing the user's web browser to request data corresponding to the hyperlink from a remote server. A web server then receives a data request from the user's computer for data corresponding to one of the modified hyperlinks and including said identification label. This allows the web server or a third party server to use the identification label to track data requests from the user's computer.

The executable instructions may include instructions to the user's computer to obtain the identification label from a web browser cookie on the user's computer, if the user's computer has such a cookie. The executable instructions may include instructions to the user's computer to obtain said identification label from the window name of the user's web browser, if the window name includes an identification label. The executable instructions may include instructions to the user's computer to obtain said identification label from the address bar of the user's web browser, if the address bar includes an identification label. The executable instructions may include code to generate a new identification label if an existing identification label is not found already stored on the user's computer. The identification label may be generated using a current date and time, and a pseudo random number.

The code may include executable instructions to set a visitor identification cookie on the user's computer for identif'ing the user's computer during a period extending over multiple web browsing sessions, e. g. several days, and providing executable instructions for the user's computer to read the visitor identification cookie and transmit its value to a remote server.

Code may be provided to record user interaction events in the user's web browser and to transmit said recorded events to a remote server. Examples of such user interaction events are mouse clicks, mouse scrolls, key presses, scrolling within a web browser window, moving the mouse out of a web browser window, selecting navigation options within a web browser window, etc. The event recordings can be stored in a temporary form (e.g volatile memory), such as in a web browser script variable, but can also be backed up in a more permanent form (e.g. stored on disc), such as in a cookie on the user's computer. This reduces the problem of event data being lost in the event of the user navigating away from the page before event data has been transmitted to the remote server.

Successful reporting the events to the server may result in code being sent from the server to the user's computer to instruct the user's computer to delete from the cookie the events that have been successfully reported to the server.

Code may be provided to generate a unique page sequence identifier for each time a web page is received by the browser from a web server, and to transmit said page sequence identifier to a remote server. This allows multiple visits to the same web page to be differentiated. The page sequence identifier may be a numerical or aiphanumerical value which is increased or incremented each time a web page is received by the web browser.

In one embodiment, the page sequence identifier may be generated as follows. The first time a web page is requested from the web server, the page sequence identifier is set to an initial value, for example, a numerical value of 1. Each subsequent time that a web page is requested from the web server, the current page sequence identifier is passed to the web server, and the current page sequence identifier is then incremented for use in the newly requested web page. The page sequence identifier may be incremented by code on the user's computer, which may receive a previous page sequence identifier, e.g. from the web server or from local data storage means, and generate a new page sequence identifier using this received previous value.

For example, in an embodiment that includes adding an identifier and a page sequence identifier to URLs in a webpage, a user selection of a hyperlink will send the current page sequence identifier to the web server. This same page sequence identifier will appear in the web browser address bar for the newly requested web page. The code on the user's computer which adapts the hyperlinks to include a page sequence identifier may recognise the value of the page sequence identifier which is shown in the address bar, and increment this value before adding it to hyperlinks in the new webpage. Thus, in this embodiment, the page sequence identifier is incremented only when the method of navigation to a webpage is by a user selecting a hyperlink from a previous webpage.

If the user navigates back to a particular webpage by using the "back" button, or using the "reload" button, then the page sequence identifier will not be incremented, because the page sequence identifier in the web browser address bar will be the same as it was at the previous time of loading that particular webpage.

Cache busting code may be provided, to reload at least one element of a webpage from its original location even if part of the webpage is cached. This reduces the risk of tracking information failing to be generated, because a web page is downloaded from a cache instead of from the original website.

Code may be provided to request a file from a third party server, and to supply information about the webpage to the third party server along with the file request. The user's computer may receive a cookie from the third party server in response to sending said request. The information supplied to the third party server may include an identification label identifying the user's computer to the third party server.

The code representing a web page may be generated by a third party server, and used to modify a web page generated by or provided by the web server. Code may be sent to the user to cause the user's browser to send event tracking data to a remote location over the network. Code may be sent to the user to cause the user's computer to set up a cookie to store said event tracking data. The hyperlink may be modified by adding a user ID number generated by the user's computer, and a page counter indicating the sequence in which this page has been visited relative to previous and subsequence page visits. The user ID number may be a random number.

The above method may be used together with log file analysis techniques or other tracking techniques to track the user's activities. A cookie may be set on the user's computer to identify the user during a future visit to the website. The server may analyse the reliability of the user tracking data and providing a score to indicate said reliability. The server may determine the method of navigation used by the user to move between pages of the website (e.g. the browser's back button, a link from a previous webpage, etc).

In embodiments of the present invention, if cookies are blocked, the website will still operate, and tracking is still possible. However, the known prior art techniques need to use cookies in addition to JavaScript, or need third party cookies, and will not work if cookies are disabled on a user's browser. In addition, the present invention allows many different parameters such as IP address, browser type, screen resolution, etc. to all be used together to see if a user is the same user as before.

Embodiments of the present invention thus involve client side generation of client id's.

This has several advantages over server side generation of client id's. Most website tracking using javascript requires each web page of the website to be modified, in order to include the tracking code. For server side generation of client id's, the code needs to be installed on the server. It can be a complex task to identify every type of link, form and redirect, and add a session id generating function onto the end, especially if a large number of web pages are involved. In embodiments of the present invention, it is not necessary to perform complex modification of the code on the server. Embodiments of the present invention have the advantage of being platform independent. As no code is interpreted on the server, it can be used on any platform, and runs on the client side in a language which is cross-platform.

A major advantage of tracking using client side code, on the user's computer instead of on the server, is that it is not necessary to rewrite the code according to the particular technology used on the server, because the code is run after the page is delivered to the client. For example, the server may generate HTML code using one of many different technologies, such as PHP, ASP, PERL etc. In systems that use server side code to generate a url session ID, it is necessary to integrate with whatever technology the server is using to generate the web pages.

Embodiments of the present invention provide for the possibility of providing a plug-in application for the web server to inject code as the web page is transferred to the user's computer.

A further aspect of the present invention provides a method and apparatusfor tracking file requests from a user's computer, by providing instructions to the user's computer to modify the window.name property of an internet browsing program running in a window on the user's computer by adding identification data to the window name. The window.name property is usually only used by websites which run in framesets or to identify pop-up windows. On most sites it is blank. The window.name property can be accessed by JavaScript code and, as a property of the browser window, is persistent between pages. Further information can be added for use in identifying the user's computer, such as a sequence of characters unique to the user's computer. The identification data may be retrieved from previous storage on the user's computer, and added to the window name, for example, it may be read from a cookie on the user's computer. Alternatively, the identification data may be newly generated, if no stored version is found.

A web server may request the user's computer to send the identification data located in the window name of the user's web browser, thus allowing identification of the user's computer. For example, it may send a web browser script to the user's computer to store the identification data in a cookie or script variable, and transmit this to the web browser. Alternatively, the user's computer may send the identification data to a web server as part of a file request for a file from the web server.

In some cases, a user may already have a window name. For example, a user may have come from a web site within a frameset, where the web browser window name has been altered to a non-default value. In this case, the unique window name is preferably left unaltered in order to avoid conflict with the operation of other web sites. Thus, an aspect of the invention also comprises requesting identification data stored in a window name of an internet browsing program running in a window on the user's computer, and using the identification data to identify file requests made from a particular user's computer. The identification data can be any data suitable for identifying that particular computer.

It is possible to modify the browser window name without having a document object model (DOM) enabled browser, thus this method can be applied in older as well as modem web browsers. The use of a window name to store a session ID does not rely on the user's computer having cookies enabled. Therefore, even if cookies have been disabled, the browser is still able to send a session ID to a third party server.

A yet further aspect of the present invention involves a method and apparatus for providing a third party tracking service by configuring a web server as a proxy server.

The method includes providing code to configure a web server to act as a proxy server for the third party server. The code includes instructions to forward tracking information received at the proxy server to the third party server. The code may be in the form of a plug-in application for the web server, to enable easy installation and maintenance. The code may also include instructions for the web server to receive data or requests from the third party server and to forward these to the user's computer. The third party server may also provide tracking code to be run on the user's computer. The tracking code may include instructions to the user's computer to send data stored in a cookie or an image request to the proxy server.

Configuring a web server as a proxy server is particularly useful if a user has disabled third party images and third party cookies. In that case, it is possible for the user's browser to send tracking information back to the web server that provided the web page, but tracking information addressed directly from the user's browser to a third party server would be blocked. A small piece of code on the web server may act as the proxy server for the third party server, thus passing all tracking information on to it. In effect, this code sets up the web server as a dummy server for tracking information.

A further aspect of the invention comprises a method and apparatus for tracking events on a user's computer and sending tracking information to a server. The method includes running a script within a web browser on the user's computer to record events within the web browser, and storing the recorded events in a script variable. The information from the script variable can also be stored on disk, e.g. in a cookie on the user's computer. Event information for at least some of said events is then sent to the server.

After the server has confirmed receipt of the event information, the received event information can be deleted from the cookie. The received event information may be deleted automatically, as the server may send code to the user's computer to instruct the user's computer to remove the received data from the cookie or other disk storage location. Thus, the script may indirectly act as a receipt, because any deleted data must have been received successfully.

The user's computer may be configured to monitor the length or size of the script variable, and transmit information from the script variable to the server if the script variable length or size exceeds a predetermined threshold. The user's computer may manipulate the script variable to ensure it does not exceed a predetermined length, e.g. by storing some of its contents only in a cookie or saved on disk.

The script variable may include one or more page visit identifiers, each page visit identifier relating to an instance of a web page being received from the server or reloaded from a cached location. The page visit identifiers are used to indicate the webpage for which each said event occurred. The page visit identifiers may comprise random or pseudo- random values, and they may be numerical, alphanumerical, or may be some other sequence of characters or bits.

Further aspects of the invention relate to analysis of the tracking data after or during the data tracking process.

One further aspect of the invention comprises obtaining and logging tracking information from the user's computer, said tracking information including page numbers identifying web pages received by the user's computer; and creating a chronological list of web pages received by the user's computer. The page numbers may have been allocated or generated in a predetermined sequence, so that they can easily be arranged into chronological order of download to the user's computer. For example, each page number may include the date and time of download.

Another aspect of the invention comprises receiving tracking information from a user's computer, and estimating a likelihood that one or more particular methods of navigation were used by the user to navigate from a first to a second web page. For example, such methods of navigation may include use of the browser forward or back button, selection of a hyperlink within a currently viewed web page, moving back to a selected location in the viewed web page history list of the web browser, moving to a bookmarked web page, moving back to the start of the session, accepting an instruction from a separate piece of software on the user's computer to navigate to a different web page, etc. It is often not possible to be certain that a particular navigation path was followed by a user's computer to navigate through a web site, e.g. due to incomplete data. The calculation of one or more likelihood or probability value or estimate can allow a most likely navigation method to be identified. It can also provide an indication of the reliability of a reconstructed user navigation path.

A further aspect of the invention comprises estimating the reliability of tracking of a particular user session, by using estimated likelihoods of one or more navigation method.

In embodiments of the invention, as an alternative or addition to storing information in cookies, it is possible to store information in the form of other locally persistent web browser data, such as Flash local shared objects.

Embodiments of the present invention will now be described, by way of example only, with reference to the accompanying drawings, in which: Figure 1 is a schematic diagram of a computer network in which an embodiment of the present invention is implemented; Figures 2A-C are flowcharts showing a method of tracking a user's progress through a website, in one embodiment of the invention; Figure 3 is a flowchart showing a method of logging a user's progress through a website, in one embodiment of the invention; Figures 4A and 4B are flowcharts showing a method of detecting and reporting events in the user's web browser, in one embodiment of the invention; Figure 5 is a flow chart showing the process on the server during events tracking, in the embodiment of figures 4A and 4B; and Figures 6A-C are flowcharts showing the process of analysing the data sent from the user's computer to estimate the likely session path and its reliability, in one embodiment of the invention.

Figure 7 is a block diagram showing the different possible methods of navigation from a web page; Figure 8 is a flowchart showing an example of a user session on a particular website; and Figure 9 is a flowchart showing an example of a web server acting as a proxy for a third party tracking server.

Figure 1 shows a computer network suitable for implementing embodiments of the invention. The network includes a user's computer 100, a web server 101, and a third party server 102, which are all connected to the internet 103. In some embodiments of the invention, the tracking is done by the third party server 102. In other embodiments, the tracking is done by the web server 101, and the third party server 102 may be omitted.

The user's computer 100 runs web browser software 112, which sends HTTP requests over the internet 103 to the web server 101, to request web pages or other data from the web server 101. HTTP communication usually takes place over TCP/IP connections.

Typically, the HTTP requests will use TCP/IP port 80, as is standard, although this is not essential.

l'he web server 101 has web server software 105 running on it, and has a log 106 for logging data requests from users. The web server 101 responds to the HTTP request from the user's computer 100 by sending the requested data back to the user's computer 100. The html code of the requested web page includes code to provide tracking facilities to monitor the user's web browsing activities. This code may be integral to the web page, as provided by the web server 101. For example, the code may be stored in a script store 107 on the web server 100, and copied into the web page when a data request is made by a user's computer. Alternatively, the web page as provided by the web server 101 may be modified by the third party server 102, to add tracking code and cause the user's computer 100 to send tracking information back to the web server 101 and/or to the third party server 102.

The amount and type of tracking information which can be sent back from the user's computer 100 is dependent on whether the user's browser has JavaScript enabled, first party cookies enabled, third party cookies enabled, or third party images enabled. If cookies are enabled in the user's web browser 112, the user's computer 100 also has a cookie store 113.

The information that is available to the server via HTTP includes the IP address, HTTP_REFERER (originating page), and the Server Time. Other information available via a URL in a webpage includes Session ID, Visitor ID, Page number, Page identification, Client Time, Screen Width, Screen Height, Colour depth, Cookie, and Originating page referrer, if Javascript is enabled.

Where JavaScript is enabled, the information that can be received also depends on whether the user is using a modern browser, which supports Document Object Model (DOM) operations, or an older browser. The Document Object Model is a platform and language neutral interface that defines the logical structure of documents and the way that documents are accessed and manipulated. DOM allows programs and scripts to construct documents, and to dynamically access and update the content, structure, and style of a document. Elements and content can be added, modified, or deleted from the document. Using the Document Object Model, almost anything found in an HTML document can be accessed, added, changed, or deleted. Older browsers have a less sophisticated document object model, for example, they do not allow dynamic changing of links. Javascript document write lets javascript code write in new elements to the webpage.

In embodiments where the tracking is done by the third party server 102, the web server may optionally include code to configure it as a proxy server 108 for the third party server 102. The third party server 102 includes a store for storing tracking data 110, and code for analysing the tracking data 111. The third party server 102 may also include a script store, for storing scripts to be run on a user's computer 100, such as tracking code scripts or scripts for indicating receipt of tracking code.

Figure 2 is a flowchart showing a method of tracking user activity at the user's web browser, according to an embodiment of the invention. The process starts at step S201, where the user's web browser starts parsing the html code received from the web server.

The html code includes embedded JavaScript code, which is designed to track and report the user's activities.

At step S202, if JavaScript is not supported or enabled on the user's web browser, the browser will be unable to run any JavaScript code. The process will then go on to step S203, where an alternative method of monitoring the user activity is used. This involves running the <noscript> section of the web page code. The <noscript> tag is a standard option for allowing non-javascript enabled browsers to follow an alternative html-based procedure.

In this example, the <noscript> section contains instructions to request an image from a third party server, and to send tracking information to the third party server along with the request. Usually, the image will be an invisible image of 1 pixel by 1 pixel. For example, the following code may be used to request an image from the third party server, where TRACKER URL is the third party website URL, TRACKER_SITE_ID identifies the web server and TRACKER PAGENAME identifies the particular web page received by the user's web browser from the web server.

If the user has disabled third party images in their web browser, then the user's browser will block the request for a third party image in step S203. In this worst case situation, no information at all will be supplied to the third party server. However, it may be possible to use a 3rd party non-image request to get round browser blocking of third party images.

In the case where JavaScript is running on the user's browser, the process goes from step S202, where the browser starts parsing the tracking code, to step S204, where the JavaScript code within the web page starts running, and sets up static JavaScript variables. These variables include cookie names, a session ID name and other variable names.

The process then proceeds to step S205, where the user's computer sets up a page visit identifier, which is a label to identify the user's visit to that particular web page.

Normally, the page visit identifier will be a random number, generated by the user's computer, although alternative methods of providing an identification number are also possible. The page visit identifier uniquely labels each page visit by the user. If the user visits a page for a second time by either reloading it, or navigating back to it using a hyperlink in the website, then the page will have a different page visit identifier when it is re-loaded.

Once the page visit identifier has been set up, the next step is S206, where the user's web browser identifies whether or not the user already has a cookie with tracked events stored in it. Examples of events in javascript are mouse move, scroll, resize. An events cookie may have been generated by JavaScript during the user's previous visit to the web page, to another web page on the website or to a related website. If such a cookie already exists with stored tracking events, then the process moves to step S207, where the user's browser sets up a JavaScript string for storing events, and stores these previous events in the events string.

An example of the data stored in the events cookie and events string is as follows. The Javascript variable es is the events cookie name, and the same information is stored in both the events cookie and the events string.

es=@@11284{90, 50,0, 100, ", 200, 500}{ }{ } The @@ is a unique label to indicate the start of the events on a particular web page.

The number 11284 is the unique page identifier number for the web page at which the events have occurred. The data for each event is stored in a set of brackets {}. Thus, the above code represents three events, although example data is only shown for the first event. The events data may include the mouse co-ordinates on the screen, the x and y scroll values, the window width and height, and an event object string with details of the particular event e.g. a mouse click. If multiple events occur on a single page, the page visit identifier may be present for the first event, but omitted for subsequent events on that page.

If no events are already stored in a cookie, the process moves from step S206 to step S208, where an event string is set up, ready to store new events. The process then proceeds to step S209.

At step S209, the user's web browser retrieves the current date and time from the system clock of the user's computer. The next step is S2 10, where the user's computer looks in the events cookie for a session id. If, at step S2 11, a session id is found in the cookie, then at step S212, the user's browser sets the JavaScript session id to be the session id found in the cookie, and the process goes on to step S220.

If no session id is found in the cookie, at step S2 11, the user's computer looks for a session id in the address bar of the web browser at step S213. If a session id is found in the address bar at step S214, then at step S215, the user's browser sets the JavaScript session id to be the session id found in the address bar, and the process goes on to step S220.

If no session id is found in the address bar, the process moves on to step S2 16, where the user's computer looks for a session id in the window.name. The window.name is a session-persistent property which is not visible to the user. The window.name may store a session id on a long term basis. If, at step S2 17, a session id is found, the process continues to step S218, where the user's computer retrieves the session id from window.name. The process then moves on to step S220.

If no session id is found in window.name, then the process moves on to step S2 19, where the user's computer creates a new session id from the date, time and a random or pseudo random number. There is thus only a very small probability that two people accessing the website during the same millisecond will choose the same random number, thus have the same session ID.

The next step is step S220, where the session id is set in a cookie to expire after 30 minutes. Thus, a user browsing session in this example is a continuous period during which a user is browsing the tracked website without taking a break of 30 minutes or more.

In most internet browsers with JavaScript support, the web browser window has a name property that can be set for the window, for example by using the Javascript "window.name" object. At step S221 in the flowchart, the user's computer looks for a window.name set to hold a session id value. The window.name value persists from one session to the next, within that browser window. Although changing the window.name value to hold a session ID can be less reliable than using cookies or adding an ID tag to the URLs in the page, use of the window.name has a high likelihood of persisting over many page views and user sessions for as long as the browser window remains open.

If the window.name has not been set with a session id already, then the window.name is set to the session id, at step S222, and the process continues to step S223. If a window.name set is found, the process moves straight from step S22 1 to step S223.

Thus, the method shown in the flowchart of figure 2 uses the first session ID that the user's web browser can find, from the three possible locations of URL, cookie or window.name. If no session ID is found in any of these three locations, a new session ID is generated.

At step S224, the user's computer looks for a visitor id in a cookie. The visitor id is for identifying a user over multiple sessions. If, at step S225, a visitor id is found, the visitor id is set from the cookie at step S226, and the process continues at step S228. If no visitor id is found, the process goes from step S225 to step S227, where the visitor id is set to be identical to the session id. The process then continues at step S228, where a visitor id cookie is set to expire in 30 days.

The next step is S229, where the user's computer looks for a page sequence number in the address bar. The page sequence number is a counter for the page. For example, it is set to zero in the first page visited on the website. If the user then navigates to another page on the website, the page sequence number is incremented. If the user returns to the first page visited by following a path of hyperlinks, then this first page is given a new page sequence number on the second visit. However, if the first page is revisited by the user pressing the back button on the web browser, then the page sequence number remains as before. This allows the tracking server to distinguish between people navigating back to the first page, or people simply retracing their steps through the website.

if, at step S230, a page sequence number is found, the user's computer retrieves the page sequence number from the address bar at step S23 1. By giving each page a number in a browsing sequence, forward navigation can be distinguished from backward navigation, reload, etc. The process then continues at step S233.

If no page sequence number is found at step S230, the process continues at step S232, where the page sequence number is set to zero. The process then continues at step S233.

At step S233, the user's computer creates the un to send with all the data to be captured.

At the next step, S234, the user's computer determines whether the browser supports DOM operations. Most modern browsers have this capability, although most older browsers do not. If the browser does support DOM operations, the process goes through steps S236 to S240.

At step S236, the user's computer creates a script block in the page with the tracking data in the un. At step S237, the user's computer appends the session id and next page number to all of the links in the page. For example, the modified URL may have one of the following formats: http://www.website.comlindex.asp?krbdlb= 123456 http://www.website. comlkrbdlb= 123456 http://www.website.coml?krbdlb= 123456 http://www. website.com?krbdlb=1 23456 http://www.website.comlindex. asp?cartid=s&krbdlb= 1 23456&pagenum2 At step S238, the user's computer appends the session id and next page number to all the form destinations in the page. A web form normally contains html forms tags, each tag having an action arid a normal URL. It is a simple matter to modify the URL on the form tag, in the same way as modifying the URL on ordinary hyperlinks in the web page.

At step S239, the user's computer adds event handlers to events to be tracked. Then, at step S240, the user's computer starts tracking events. The process then goes to step S24 1, where the browser returns to the page flow.

if the browser on the user's computer does not support DOM operations, the process goes from step S234 to step S235, where the user's computer writes the image tracking tag into the page, with the unique ID number to identify the user's computer. For example, the image tracking tag may be written as <img src="kldq.. . &krbdlb= I 23456&rn=995"> Then, the process goes on to step S24 I, where the browser returns to the page flow.

Next, at step S242, the browser registers changes in the HTML content. It sends data to the server through the new element's URL at step S243. At step S244, the browser finishes parsing the tracking code.

If the user moves to another page on the site, the above process repeats. This time, the unique random number to identify the page will have a different value. The page sequence number will be incremented by one, and the session ID will remain the same as before.

The stored events will be recorded in JavaScript, but the events cookie will be updated regularly with the new events, and when it reaches a predetermined size, it will be sent to the server.

If the user moves to a new web page on the site, this will be shown in the events cookie, by the labelling of the events with the page identifier. For example, if on a webpage with page number pnl 1284, the events cookie is: es=@@11284{90, 50,0, 100, ", 200, 500}{ }{ } and the user moves to a new webpage with page number pn=2973, then after some events have been recorded on the new web page, the events cookie will look like: es=@@11284{90, 50,0, 100, ", 200, 500}{ H}@@2973{ }{ } All data to be sent is stored in a cookie. A confirmation is sent to the user's computer by the server, to confirm receipt of the data, before the received data is deleted from the cookie. l'ypically, information is stored every 300ms. The data to be sent is limited to 2 kilobytes, because of the maximum size of a URL allowed in the common web browsers. Thus, a trigger point of approximately 1 kilobyte is chosen, and if the event string exceeds this length, then the data is sent. This corresponds to about 30 seconds of user browsing. If the data to be sent is too large, it may be necessary to trim it down.

The 1k of data is stored both in the javascript variable and in the cookie. Data that is lost from the javascript variable will be kept in the cookie. Both the javascript variable and the cookie are trimmed when a receipt for data is received from the server. This is transmitted by loading JavaScript from a remote server. The script is loaded into the web browser and is run within the domain of the current page and is therefore able to manipulate JavaScript variables within it. It is unique to the present invention to be able to do this synchronisation and deletion. Thecookie stores all unacknowledged events, but the javascript variable only stores the events for one page. The Javascript code may have a variable storing the length of the javascript event string variable, and a counter incrementing this length where necessary. It is possible to use an array instead ofjust a number to track the sequence of data sent to the server.

There is a larger size limit on a cookie than a URL. If the length of the stored data in the cookie exceeds the threshold, the events cookie will be sent to the server. If this is successfully acknowledged, then the relevant part of the events cookie will be deleted.

Figure 3 is a flowchart showing the logging process, for logging the tracking data received from the user's computer. This tracking data may be received at the web server or at a third party server.

The process starts at step S30 I, where the server receives tracking data, generated by the JavaScript tracking code embedded within the downloaded web page. The next step is S302, where the server checks to see whether the session Id has been received in the data. If the session id has been received, then at step S303, the server sets up the logged session id to be the same as the session id received from the client. If the session id has not been received in the data from the web browser, then at step S304, the server checks to see if the session id is stored in a third party cookie on the user's computer. If the session ID is found in a third party cookie, then at step S305, the server sets up the logged session id to be the same as the session id in the third party cookie. If the session ID is not found in the received data, or in a third party cookie, then at step S306, the server creates a new session id for recording the received data.

After setting up the session id for recording the data, the next step is S307, where the server logs the received server-side data. Then, at step S308, the server logs the received data. When the received data has been logged under an appropriate session id, the server sends a third party cookie to the web browser, including the session id, at step S309. If the web browser has not disabled third party cookies, then the cookie will be set on the user's computer, and the server can recognise the user in the future, when further tracking data is received.

Figure 4 is a flowchart showing the event capture process on the user's computer. The process starts at step S40 I, where the user interacts with their computer to fire an event.

For example, the event may be a mouseup, mousemove, scroll, mouseout, resize, etc. Next, at step S402, the javascript code in the user's web browser causes the user's computer to calculate the window width, window height, scroll co-ordinates and mouse co-ordinates. At step S403, the user's computer identifies whether the event was a mousedown event, or if it has been more than a set time period since the last event was registered, or if it looks as if the mouse has moved out of the window.

If the event was identified as a mousedown event, then at step S404, the user's computer captures some information about what has been clicked on by the user. At step S405, the user's computer captures information about the event source element.

Next, at step S406, the user's computer calculates the number of milliseconds this event occurred since the page was first loaded. The process then continues on the next page, at step S407, where the user's computer builds up a short string containing the event information. If at step S408, the event string is beyond a certain length, then the process moves on to step S410, where the user's computer manipulates the event string to ensure that it does not exceed a chosen length. Then, at step S4 11, the user's computer changes the tracking script tag source with the event string data on a un. The process then goes on to step S412.

If the event string is below the predetermined length, then the process goes immediately to step S412, where any remaining event string data is stored in a cookie on the user's computer. The next step is S413, where the user's computer sends the cookie to the server, and receives confirmation of event data receipt from the server. If the mouse is moved out of the window or a mouse button is pressed, the event may be sent straight away, otherwise events are sent only when the length of the event string exceeds the predetermined threshold.

The confirmation from the server may take the form of code sent to the user's computer, to clean up the cookie by removing the received data. The absence of data in the cleaned cookie then confirms receipt of that data by the server. At step S414, the user's computer removes data that has been delivered to the server from the event string and cookie. The function ends at step S415.

Figure 5 shows the same process as figures 4A and 4B, but shows steps carried out on the server. Firstly, at step S500, the server receives events data from the user's computer. This could be sent as a session variable on an http request. An example of the format of the events data sent from the user's computer to the server is krbdlb=l 1023 1&e=@@1 1257{....}{....}, where 110231 represents the session ID, and e=@@.... is an events string, storing events data for several different times.

At step S501, the server logs the received event data. Then, at step S502, the server generates javascript code to delete the received event data from the event string and event cookie on the user's computer. At step S503, the server sends the generated javascript code to the user's computer. This code will run on the user's computer at step S504, to remove the received event data from the event string and event cookie.

Removal of this data acts as a receipt from the server to confirm that it has received the event data that the user's computer sent to it.

Figure 6 shows an example of different possible methods of navigation from a web page. The user can either reload the page, navigate back to the first page of a session, press the back button of the browser to navigate back to the referer web page, press the forward button of the browser to navigate forwards to a previously viewed page after the back button has been used, click on an internal link in the webpage to take the user to another web page in the same web site, or click on an external link in the web page to take the user to a web page on a different web site. Other possible navigation methods (not shown) include the use of the history list and the use of bookmarks. Part of the aim of the analysis part of the process described in figures 7A to 7C is to reconstruct which of these navigation types has taken place at each stage.

Figures 7A to 7C are flowcharts showing a method of session path analysis according to an embodiment of the invention. The process starts at step S601, where a session is created on the web server or third party server. The next step is step S602, where the server stores session wide properties, such as the screen size, browser type, etc. Next, in step S603, the server loops through all page views recorded for the session. At step S604, the server takes each page URL and processes it into a standard form. A method for taking generic URLs with a mixture of base URL, significant and non-significant query string parameters and creating a common representation so that identical pages are not double counted. For example, http://www.ab.com/index.php?pagelD=5 &sessionlD= 10101 0&finished=true -> http://www.ab.comlindex.php?finished=true&pagejD=5 and http://www.ab.comlindex.php?pagelD=5&sessionlD=1111111&fipjshed=true -> http://www.ab.comlindex.php?finished=true&pagejD=5 so both pages are registered as being from the same source.

The next step is step S605, where the server identifies the source page and store.

Method for identifying pages based on their URL but not dependent on session-specific query string data.

The process then moves on to step S606, where the server identifies the referring page either as coming from this site, a search engine or an external site, and the server stores this identification.

At step S607, the server adds events for each page. Then, at step S608, the server determines how likely it is that we will be able to track this session completely. This depends on the tracking results received using cookies, JavaScript and whether the server receives data relating to multiple pages.

At step S609, the server loops through all the page views in the session and determines the navigation sequence. Then, at step S610, the server awards a score based on the likelihood of being a particular navigation type, for example, forward navigation, backward navigation, page reload, back to first page of session, first page of session, or new first page.

Each type of navigation starts with a score of zero. As the tracking data is analysed, each type of navigation has its score incremented, according to the likelihood of that navigation type being the method actually used in the session. If cookies, JavaScript, ctc, are disabled, the score allocated to a particular navigation type may be increased by a smaller amount, due to a reduction in certainty that this is the correct navigation type, and the other scores may be increased to compensate. Thus, the scores provide a reliability indicator, giving the reliability of tracking the path, actions, navigation steps or events taken through the website. The page number, referrer URL and events are the main things used to allocate a score to each possible navigation type. Thus, if event information is included in the tracking information, the reliability of tracking is improved. Tracking is not restricted to using the page numbering.

The next step is S61 1, where the server determines whether the page is the first page in the session. If the server decides that the page is the first page in the session, the server then checks to see if the referrer link is an internal link, or an external link, at step S612.

If the referrer link in internal, at step S613, the server deduces that the page is probably an incorrectly tracked page, since the first page of a session should have an external referrer. The process then continues at step S615. However, if the referrer link is external, at step S614 the server can be very sure that this is the first page in the session.

The process then continues at step S61 5.

At step S615, the server checks to see if there are other pages in the session. If there are other pages, then at step S6 16, the server concludes that the session is very likely to be trackable, and the process continues on the next page, at step S628. If there are not any other pages in the session, the server concludes at step S6 17 that this may be a single page session, or that it may be a multi page session where tracking of the other pages was unsuccessful.

The above applies if the page was determined to be the first page in the session.

However, if at step S6 11, the page is not found to be the first page in the session, then the process moves to step S61 8. At step S6l8, the server compares the page URL and the referrer to see if they look the same as the first page of the session. If they do, then at step S6 19, the server deduces that it is likely that this is a return to the first page of the session. If they do not, then at step S620, the server checks whether the referrer of the current page is identical to the URL of the previous page. If the referrer of the current page and the URL of the previous page are identical, then at step S62 1, the server concludes that the most likely navigation type is forward navigation.

If the referrer of the current page is different from the URL of the previous page, the process moves to step S622, where the server checks to see whether it has seen this combination of URL and referrer before. This identification of whether a page has been visited before, based on its URL and its referrer's URL, assists in deciding whether the visit is logged as forward, backward, or reload navigation, etc. If the combination of URL and referrer has not been seen before, then the process goes on to step S628.

However, if the combination of URL and referrer has been seen before, the process goes on to step S623, where the server determines whether or not the combination of URL and referrer was seen before in the previous page view. If it was the previous page view, then at step S624, the server concludes that the current page view is probably a reload of the previous page view. If it was not the previous page view, then at step S625, the server concludes that this is probably an instance of backward navigation. The server then checks, at step S626, whether the previous page view was reached using backward navigation. If not, the process continues at step S628. If it was, then at step S627, the server concludes that it is possible that the user is following a backward navigation path, and then proceeds to step S628.

At step S628, the server checks to see if the previous page view has recorded events associated with it. If not, the process goes on to step S636. If the previous page does have recorded events, the process goes on to step S629, and we can be more confident about our assessment of the page sequence and navigation type. A counter in the page URL can correctly follow non-linear page navigation e.g. going back three pages.

The next step is S629, where the server determines whether the last event was a click. If not, at step S634, the server determines whether the last event was a mouseout.

If the last event was a click, then at step S63 1, the server determines whether the URL of the object clicked on looks like the current URL. If so, at step S631, the server concludes, at step S632, that it is very likely that this was forward navigation, and proceeds to step S633. If the URL of the object clicked on does not look like the current URL, then the process goes directly on to step S633.

If the last event was not a click, but was a mousedown, then the process goes from step S634 to step S635. At step S635, the server concludes that it is more likely that the navigation was back or reload navigation, and then the process proceeds to step S633.

If the last event was neither a click nor a mousedown, the process goes from step S634 to step S633.

At step S633, the server logs the navigation type and probability by picking the navigation type with the highest score. The process then continues to step S636, where the server marks the session as traceable or not, based on the score threshold for each page view.

Figure 8 shows an example of possible user session paths on a particular website. The user may enter the web site from another web site, such as a search engine. The user may click on internal links in the web page to navigate to another web page on the same web site. The user may exit the website by exiting the browser or starting a new user session. An aim of the process in figures 7A to 7C is to construct a sequence of how each transition occurred between different web pages.

A further aspect of the present invention involves providing a third party tracking service via a proxy server. This is particularly useful if a user has disabled third party images and third party cookies in their web browser. In that case, it is possible for the user's browser to send tracking information back to the webserver that provided the webpage, but tracking information addressed directly from the user's browser to a third party server would be blocked.

A small piece of code on the web server acts as the proxy server for the third party server, thus passing all tracking information on to it. In effect, this code sets up the web server as a dummy server for tracking information.

Figure 9 shows an example of this further embodiment of the invention, where a web server acts as a proxy for a third party tracking server. In this example, the third party server provides tracking code to the web server for every HTTP request, although this is not essential, as the tracking code could simply be stored at the web server.

In step S700, the web server receives an I-ITTP request from the user. In step S70 1, the web server requests tracking code from a third party server. In step S 702, the web server receives tracking code from the third party server and adds the tracking code to the web page html code. At step S703, the web server sends the modified web page to the computer of the user who made the HTTP request. At step S704, the web server receives trackin data I'roni the use(s computer. At step S705. the web server I'orwards the tracking data to the third party server.

As ill he clear from the previous discussion. the amount of' data obtainable is very dependent on the users web browser settings. ihe lollowing table shows what inlorniation it is possible to obtain in each case.

Session Id Basic data Fxiended data Event data Visitor id No scripts, yes From an image tag alone, the IP address and http_user_agent (i.e. the type of browser, version and platform) are available to the tracking website plus some other basic browser information. This is classified as "Basic data" in the above table.

Extra information available using JavaScript includes (but is not limited to) the following: IP address, http_user_agent, screen resolution, session ID, previous page (referrer), visitor ID (cookie based, for longer term user identification) , screen width, height, number of colours, and whether cookies are supported. This is "extended data"

in the above table.

If Javascript is not enabled on the browser, it is not possible to obtain extended data or event data. Basic data is still available, but we need to rely on cookies to obtain a session ID and visitor ID.

If Javascript is enabled, but the browser does not support DOM operations, it is possible to obtain basic data, extended data, a session ID, and if cookies are enabled, also a visitor ID. However, it is not possible to obtain event data.

If Javascript is enabled, and the browser does support DOM operations, then basis data, exended data, a session ID, and event data can all be obtained. A visitor ID can be obtained if cookies are enabled.

Embodiments of the present invention provide for the possibility of providing a plug-in application for the web server to inject code as the web page is transferred from the server to the user. This allows tracking code to easily be included in any webpage, without requiring extensive modification at the server.

Another aspect of the invention is the use of the window.name field to store a session ID. This does not rely on the user having a DOM enabled web browser. Therefore, even if cookies have been disabled, the browser is still able to send a session ID to the third party server. The window. name may be used to track the user, and store an ID.

Another possibility is that if the user has bookmarked the munged URL, or copied it for future use, then the use of this bookmark or copied link will retain the session ID. The session ID expired when used in a cookie, because the cookie is set to expire. However, the session ID does not expire after a time limit when used in the window.name, unless the browser window is closed. If the third party server recognises an expired session ID, it can use this information to deduce that the same computer was involved in both the original session and the new session. The third party server can tell if the session ID is expired by checking the date and time section of the session ID, and comparing to the date and time of the user request. By looking back at old files in the log, the session ID can be used to identify user sessions. If the original session had a visitor cookie set, and the later session had the visitor cookie reset (e. g. due to user deletion of the cookie), then the two can be matched.

In principle, capturing events is not new. Conventionally, event data is sent every time a user clicks on something in the webpage, or changes something in a page. However, this can cause problems with data flow. If the connection is slow or intermittent, the event data may not arrive in time. A further aspect of the present invention provides a solution to this problem, by also storing events for a webpage in a cookie, as well as in a javascript variable. The cookie is cleared when data is received by the server.

Embodiments of the present invention provide an improved method of user tracking, by use of a page sequence identifier for each instance of a webpage loaded from the web server, or reloaded from a cached location. The page sequence identifier is important for allowing a complete path to be traced through a site. A page sequence identifier can thus provide a page counter for pages downloaded from the website. It may comprise, for example, an alphanumeric sequence or a page number, which is incremented each time a webpage is loaded. For example, the page sequence identifier may be set to a first numerical value in a first visit to the webpage, and to a second, higher numerical value in a second visit. This allows the third party server to tell if a user has navigated to that page by pressing the back button on their browser, or if the page has actually been reloaded by clicking a link or pressing reload. Benefits of the page number aspect of the invention may also be achieved using server side code.

The present invention may be used in combination with other known techniques. For example, data available from a log file can be compiled and combined to provide statistics or information such as the number of requests made by users to the website in a given time interval, the total number of files and amount of data successfully sent to users' computers, the number of requests by type of file, the distinct IP addresses served and the number of requests each made, the number of requests by domain suffix (derived from IP addresses), the number of requests for specific files or directories, the number of requests by HTTP status codes (successftil, failed, redirected, informational), totals and averages by specific time periods (hours, days, weeks, months, years), URLs from which user came to the site (referring pages), browsers and versions making the requests. Also, information can be obtained on the usability of the web site, the speed of user navigation, the typical routes through the website, webpage hot spots or trouble spots, the time to complete transactions on the website, the time from page load to page exit, and the exit routes used from each webpage. For e.g. an internet commerce website, information can be obtained on the number of new customers per unit time, the number of customers who ordered, the number of customers who tried and failed to order, and the total web site orders and their value.

It is preferable to make the added JavaScript code as small as possible, since it will be added to every page on the website. For example, the JavaScript code may be no more than around 3kB in size.

The web server may receive code for sending to the user's computer as plug-in code, which can easily be inserted straight into many webpages without requiring a modification of the existing webpage script. For example, the plug-in code may be supplied by a third party, e.g. from a third party server to the web server, either automatically or otherwise. However, if other JavaScript code already exists on the page, for example, for popup windows or drop down menus, conflicts may occur between the existing code and the tracking code. For example, conflicts ofjavascript variable name may occur. It may therefore be more difficult in some cases to re-write the links dynamically, and it may be necessary to check the compatibility of the existing code and tracking code, and modify one or the other accordingly.

The web server may receive code in the form of plug-in code, to configure it as a proxy server to a third party server.

The various aspects of the present invention, including the client side URL munging, the event tracking using a cookie and a script variable to store events data, the configuration of a web site as a proxy server for a third party server, the use of the window name in a web browser to store tracking information, and the generation and use of page sequence identifiers, may each be used independently, or may be used in any combination.

The present invention can be implemented by software or programmable computing apparatus. The code for each process in the methods according to the invention may be modular, or may be arranged in an alternative way to perform the same function. The methods and apparatus according to the invention are applicable to any computer with a network connection.

The present invention encompasses a carrier medium carrying machine readable instructions or computer code for controlling a programmable controller, computer or number of computers as the apparatus of the invention. The carrier medium can comprise any storage medium such as a floppy disk, CD ROM, DVD ROM, hard disk, magnetic tape, or programmable memory device, or a transient medium such as an electrical, optical, microwave, RF, electromagnetic, magnetic or acoustical signal. An example of such a signal is an encoded signal carrying a computer code over a communications network, e.g. a TCP/IP signal carrying computer code over an IP network such as the Internet, an intranet, or a local area network.

The method according to embodiments of the present invention may be performed on any networked computer, including for example, a PC, an interactive television based system, a telephone based system such as WAP, etc. While the invention has been described in terms of what are at present its preferred embodiments, it will be apparent to those skilled in the art that various changes can be made to the preferred embodiments without departing from the scope of the invention, which is defined by the claims.

Claims

CLAIMS: 1. A method of tracking data requests from a user's computer via a

data network to a web server, the method comprising: providing code representing a web page to the user's computer, the code including executable instructions to a user's web browser to obtain or generate identification data for identifying the user's computer and to modify one or more hyperlinks in said web page by adding the identification data to each said hyperlink; receiving at the web server a data request from the user's computer for data corresponding to one of the modified hyperlinks and including said identification data; and using said identification data to track data requests from the user's computer.
2. A method according to claim 1, wherein said executable instructions include instructions to the user's computer to look for said identification data in a web browser cookie on the user's computer.
3. A method according to claim I or claim 2, wherein said executable instructions include instructions to the user's computer to look for said identification data in a window name of the user's web browser.
4. A method according to any previous claim, wherein said executable instructions include instructions to the user's computer to look for said identification data in a web address of a currently displayed webpage in the user's web browser.
5. A method according to any previous claim, wherein said executable instructions include code to generate new identification data if existing identification data is not found on the user's computer.
6. A method according to claim 5, wherein the identification data is generated by the user's computer using a current date and time, and a pseudo random number.
7. A method according to any previous claim, wherein said code further comprises executable instructions to set a visitor identification cookie on the user's computer for identifying the user's computer during a period of at least several days, and executable instructions to read said visitor identification cookie and transmit data stored in the cookie to a remote server.
8. A method according to any previous claim, further comprising providing code to record user interaction events that occur within the user's web browser and to transmit said event recordings to a remote server.
9. A method according to claim 8, wherein the code includes instructions to the user's computer to store the event recordings in each of a web browser script variable and an event cookie on the user's computer.
10. A method according to claim 8 or claim 9, wherein in response to the remote server receiving event recordings from the user's computer, the remote server sends code to the user's computer, the code including executable instructions to delete from the event cookie the event recordings that have been received by the remote server.
11. A method according to any previous claim, further comprising providing code to generate a page sequence identifier each time a web page is received by the browser from the web server, and to transmit said page sequence identifier to the web server or to a third party server.
12. A method according to claim 11, wherein said hyperlinks in the web page are modified by adding both the identification data and the page sequence identifier.
13. A method according to any previous claim, wherein said code further comprises cache busting code to reload at least one element of a web page from its original location even if part of the web page is cached.
14. A method according to any previous claim, further comprising providing code to request a file from a third party server, and to supply additional information from the user's computer to the third party server along with the file request.
15. A method according to claim 14, wherein said additional information supplied to the third party server includes identification data identifying the user's computer to the third party server.
16. A method according to claim 14 or claim 15, comprising sending a cookie from the third party server to the user's computer in response to the user's computer sending said request for a file.
17. A method according to any previous claim, wherein said code representing a web page is generated by a web server, and said executable instructions are generated by a third party server and incorporated into the code representing the web page.
18. A method according to any previous claim, further comprising estimating a likelihood that one or more particular methods of web page navigation were used at the user's computer to move between pages of the website, by analysing relationships between the web pages that have been requested by the user's computer.
19. A method according to any previous claim, further comprising reconstructing a record of the data requests made by the user's computer and analysing the reliability of the record by calculating a score indicating the self consistency of the record.
20. A method according to any previous claim, further comprising providing proxy code to the web server, to configure the web server to act as a proxy server for a third party server, said proxy code comprising instructions to forward the identification data from the web server to the third party server.
21. A method on a third party server of tracking data requests from a user's computer via a data network to a web server, the method comprising: providing tracking code for including within web page code representing a web page from the web server, the tracking code including executable instructions to a user's web browser to obtain or generate identification data for identifying the user's computer and to modify one or more hyperlinks in said web page by adding the identification data to each said hyperlink; receiving user data request information from the web server, including information identifying a user requested web page and said identification data; and using said information and said identification data to generate a record of web browsing activity on the user's computer.
22. A method on a web server of tracking data requests from a user's computer via a data network to the web server, the method comprising: sending code representing a web page to the user's computer, the code including executable instructions to a user's web browser to obtain or generate identification data for identifying the user's computer and to modify one or more hyperlinks in said web page by adding the identification data to each said hyperlink; receiving a data request from the user's computer for data corresponding to one of the modified hyperlinks and including said identification data; and sending said identification data and information identifying a user requested web page to a third party server to track data requests from the user's computer.
23. A third party server for tracking data requests from a user's computer via a data network to a web server, the third party server comprising: means for providing tracking code to be added to web page code representing a web page from the web server, the tracking code including executable instructions to a user's web browser to obtain or generate identification data for identifying the user's computer and to modify one or more hyperlinks in said web page by adding the identification data to each said hyperlink; means for receiving and storing user data request information from the web server, including information identifying a user requested web page and said identification data; and means for generating a record of web browsing activity on the user's computer, using said information and said identification data.
24. A web server for tracking data requests from a user's computer via a data network to the web server, the web server comprising: means for providing code representing a web page to the user's computer, the code including executable instructions to the user's web browser to obtain or generate identification data for identifying the user's computer and to modify one or more hyperlinks in said web page by adding the identification data to each said hyperlink; means for receiving a data request from the user's computer for data corresponding to one of the modified hyperlinks and including said identification data; and means for sending said identification data and information identifying a user requested web page to a third party server to track data requests from the user's computer.
25. A web server according to claim 24, wherein said executable instructions include instructions to the user's computer to look for said identification data in a web browser cookie or in other locally persistent web browser data on the user's computer.
26. A web server according to claim 24 or claim 25, wherein said executable instructions include instructions to the user's computer to look for said identification data in a window name of the user's web browser.
27. A web server according to any one of claims 24 to 26, wherein said executable instructions include instructions to the user's computer to look for said identification data in a web address of a currently displayed webpage in the user's web browser.
28. A web server according to any one of claims 24 to 27, wherein said executable instructions include code to generate new identification data if existing identification data is not found on the user's computer.
29. A web server according to claim 28, wherein the identification data is generated by the user's computer using a current date and time, and a pseudo random number.
30. A web server according to any one of claims 24 to 29, wherein said code further comprises executable instructions to set a visitor identification cookie on the user's computer for identifying the user's computer during a period of at least several days, and executable instructions to read said visitor identification cookie and transmit data stored in the cookie to a remote server.
31. A web server according to any one of claims 24 to 30, further comprising providing code to record user interaction events that occur within the user's web browser and to transmit said event recordings to a remote server.
32. A web server according to claim 31, wherein the code includes instructions to the user's computer to store the event recordings in each of a web browser script variable and an event cookie on the user's computer.
33. A web server according to claim 31 or claim 32, wherein in response to the remote server receiving event recordings from the user's computer, the remote server sends code to the user's computer, the code including executable instructions to delete from the event cookie the event recordings that have been received by the remote server.
34. A web server according to any one of claims 24 to 33, further comprising providing code to generate a page sequence identifier each time a web page is received by the browser from the web server, and to transmit said page sequence identifier to the web server or to a third party server.
35. A web server according to any one of claims 24 to 34, wherein said hyperlinks in the web page are modified by adding both the identification data and the page sequence identifier.
36. A web server according to any one of claims 24 to 35, wherein said code further comprises cache busting code to reload at least one element of a web page from its original location even if part of the web page is cached.
37. A web server according to any one of claims 24 to 36, further comprising providing code to request a file from a third party server, and to supply additional information from the user's computer to the third party server along with the file request.
38. A web server according to claim 37, wherein said additional information supplied to the third party server includes identification data identifying the user's computer to the third party server.
39. A web server according to claim 37 or claim 38, comprising sending a cookie from the third party server to the user's computer in response to the user's computer sending said request for a file.
40. A web server according to any one of claims 24 to 39, wherein said code representing a web page is generated by a web server, and said executable instructions are generated by a third party server and incorporated into the code representing the web page.
41. A web server according to any one of claims 24 to 40, further comprising estimating a likelihood that one or more particular methods of web page navigation were used at the user's computer to move between pages of the website, by analysing relationships between the web pages that have been requested by the user's computer.
42. A web server according to any one of claims 24 to 41, further comprising reconstructing a record of the data requests made by the user's computer and analysing the reliability of the record by calculating a score indicating the self consistency of the record.
43. A third party server for tracking data requests from a user's computer via a data network to a web server, the third party server comprising: a data store storing tracking code to be added to web page code representing a web page from the web server, the tracking code including executable instructions to a user's web browser to obtain or generate identification data for identifying the user's computer and to modify one or more hyperlinks in said web page by adding the identification data to each said hyperlink; a network connector for sending the tracking code to the web server and for receiving user data request information from the web server, including information identifying a user requested web page and said identification data; and a processor for generating a record of web browsing activity on the user's computer, using said information and said identification data.
44. A web server for tracking data requests from a user's computer via a data network to the web server, the web server comprising: a data store storing code representing a web page to the user's computer, the code including executable instructions to a user's web browser to obtain or generate identification data for identifying the user's computer and to modify one or more hyperlinks in said web page by adding the identification data to each said hyperlink; and a network connector for receiving a data request from the user's computer for data corresponding to one of the modified hyperlinks and including said identification data, and for sending said identification data and information identifying a user requested web page to a third party server to track data requests from the user's computer.
45. A method of tracking data requests from a user's computer via a data network to a server on the internet, the method comprising providing code to the user's computer, the code comprising executable instructions to modify a window name of an internet browsing program running in a window on the user's computer by adding identification data to the window name; and providing a request to the user's computer for the identification data in the window name; and receiving the identification data from the user's computer.
46. A method according to claim 45, wherein the code comprises executable instructions to send the identification data to the server as part of a file request for a file on the server.
47. A method at an internet server of tracking data requests from a user's computer via a data network to the internet server, the method comprising requesting identification data stored in a window name of an internet browsing program running in a window on the user's computer; using said identification data to identify data requests made from a particular user's computer.
48. An internet server for tracking data requests from a user's computer via a data network to the internet server, the internet server comprising: means for providing code to the user's computer, the code comprising executable instructions to modify a window name of an internet browsing program running in a window on the user's computer by adding identification data to the window name; and means for requesting the identification data in the window name from the user's computer.
49. An internet server for tracking data requests from a user's computer via a data network to the internet server, the internet server comprising means for requesting identification data stored in a window name of an internet browsing program running in a window on the user's computer; means for using said identification data to identify data requests made from a particular user's computer.
50. An internet server for tracking data requests from a user's computer via a data network to the internet server, the internet server comprising: a data store for storing code to be sent to the user's computer, the code comprising executable instructions to modify a window name of an internet browsing program running in a window on the user's computer by adding identification data to the window name, arid for storing code to request the identification data to be sent back from the user's computer to the internet server; and a network connector for sending the code to the user's computer and for receiving the identification data in the window name from the user's computer.
51. An internet server for tracking data requests from a user's computer via a data network to the internet server, the internet server comprising a data store for storing code to request identification data stored in a window name of an internet browsing program running in a window on the user's computer a network connector for sending said request and receiving said identification data; and a processor for using said identification data to identify data requests made from a particular user's computer.
52. A method of receiving tracking data from a user's computer at a third party server, the method comprising providing code to configure a web server to act as a proxy server for said third party server, said code comprising instructions to forward tracking information received at the proxy server to the third party server; and providing tracking code to the proxy server to be run on the user's computer, the tracking code comprising code for sending tracking information to third party server via the proxy server.
53. The method of claim 52, wherein the tracking code comprises instructions to send a cookie or an image request from the user's computer to the proxy server.
54. A web server configured to receiving tracking data from a user's computer, the web server comprising means for forwarding said tracking data to a third party server; means for receiving data requests for the user's computer from the third party server; and means for forwarding the data requests to the user's computer.
55. A method of tracking events on a user's computer and sending tracking information to a server, the method comprising: running a script within a web browser on the user's computer to record events within the web browser, and storing the recorded events in a script variable; storing the information from the script variable in a cookie or as other locally persistent web browser data on the user's computer; sending event information for at least some of said events to the server; deleting said event information from the cookie or other locally persistent web browser data after the server has confirmed receipt of said event information.
56. A method as claimed in claim 55, further comprising: receiving a script from the server in response to sending the event information, wherein said script is configured to delete the event data received by the server from the cookie or other locally persistent web browser data.
57. A method as claimed in claim 55 or claim 56, further comprising monitoring the length of the script variable, and sending information from the script variable to the server if the script variable length is greater than a predetermined threshold.
58. A method as claimed in claim 57, comprising manipulating the script variable to ensure it does not exceed a predetermined length.
59. A method as claimed in any one of claims 55 to 60, wherein the script variable comprises one or more page visit identifiers, each page visit identifier relating to an instance of a web page being received from the server or reloaded from a cached location, the page visit identifiers being used to indicate the webpage for which each said event occurred.
60. A method of tracking a user's internet browsing activity, comprising obtaining and logging tracking information from the user's computer, said tracking information including page sequence identifiers each identifying an instance of a web page being received by the user's computer; and creating a chronological list of web pages received by the user's computer.
61. A method as claimed in claim 60, further comprising estimating a reliability score to indicate the reliability of the chronological list, using relationships between the web pages in the chronological list.
62. A method of obtaining data identifying a user's computer at a third party server, the method comprising: providing a web server with executable code to configure the web server to act as a proxy server to the third party server, where the web server sends code representing a web page to the user's computer, the code including a hyperlink to the web server; receiving a data request from the user's computer at the web server and forwarding the data request to the third party server, the data request including information identifying the user's computer.
63. A method of tracking data requests from a user's computer via a data network to a web server, the method comprising: receiving information identifying data requests made by a particular user's computer; and estimating a likelihood that one or more particular methods of navigation were used on the user's computer to navigate from a first to a second web page, using relationships between the received data requests.
64. A method as claimed in claim 63, further comprising using said likelihoods to estimate a reliability score to indicate the reliability that the navigation method has been correctly identified.
65. A carrier medium carrying computer readable code for controlling a computer to carry out the method of any one of claims ito 23, 45 to 47, 52, 53 or 55 to 64.