WO2001009771A9

WO2001009771A9 - Targeted advertising system

Info

Publication number: WO2001009771A9
Application number: PCT/US2000/020999
Authority: WO
Inventors: Robert M Giuli; Stanley George Fisher
Original assignee: Gen Dynamics Gov Sys Corp
Priority date: 1999-08-03
Filing date: 2000-08-02
Publication date: 2002-09-06
Also published as: WO2001009771A1; AU6617700A

Abstract

A client (106) accesses a content serving site (102) and information from this access is passively gathered by the content serving site (102). The information obtained from the client may be used to intelligently select content, such as advertisements, to include with the content accessed by the client. When a client (106) initially arrives, the client is assigned a unique anonymous identifier and an entry is created in a database (107). As the client (106) moves around the content serving site (102), the client's profile is updated in the database (107) based on the clients actions. The stored information can be analyzed and provided to Internet service provides (ISP) to provide better service to their clients. Additionally, the information regarding client preferences can be used to alter the content of the information provided to the client (106) in real time using a database that stores context information for each page of the content serving site (102) and context information for each potential page from which the client (106) can be referred.

Description

TARGETED ADVERTISING SYSTEM

CROSS REFERENCE TO RELATED APPLICATION

This application claims the benefit of United States Provisional Patent

Application No. 60/146,955 filed on August 3, 1999.

BACKGROUND OF THE INVENTION

Field of the Invention

The present invention relates to a system for providing users with targeted

information, e.g., an advertisement, based on a profile of the user's preferences.

Description of the Related Art

The Internet excels at cheaply delivering information to a wide audience. Internet

sites that compile and make the information available to users, often called "content"

sites, are rapidly becoming an important element in national commerce. One way these

sites generate income is by charging advertisers to display advertisements in, e.g., a

banner, when their site is accessed. Users reading content at the site are exposed to the

advertisements.

To maximize the efficiency of the individual ads, it is desirable to tailor the

advertisements to the particular individual user. One conventional way to tailor

advertisements to a particular user is to have the user initially fill out a questionnaire

relating to the user's hobbies, demographic information, employment information, etc.

This method of obtaining information about the user, however, is burdensome to the user

and is limited to the questions in the questionnaire.

Thus, there is a need in the art to be able to obtain profile information specific to a user without having to explicitly question the user. SUMMARY OF THE INVENTION

Systems and methods consistent with the present invention overcome the

shortcomings of the prior art by passively obtaining profile information regarding

individual users of a network.

Additional objects and advantages of the invention will be set forth in part in the

description which follows, and in part will be obvious from the description, or may be

learned by practice of the invention. The objects and advantages of the invention will be

realized and attained by means of the elements and combinations particularly pointed out

in the appended claims.

In accordance with the purpose of the invention, as embodied and broadly

described herein, systems and methods consistent with the invention target information to

a user of a network having content information by passively gathering parameters from a

user's request of content information, determining, from the parameters, a user profile,

and providing targeted information to the user based on the user profile.

In accordance with another aspect of the invention, systems and methods

consistent with the invention receive a user's request of content information from a first

server, determine parameters from the user's request, and send the parameters to a second

server for storing information regarding the user. Such systems and methods also provide

a database, used in connection with content information on a network, including

information regarding an address of the content information and information regarding a

plurality of addresses containing references to the address of content information. It is to be understood that both the foregoing general description and the following

detailed description are exemplary and explanatory only and are not restrictive of the

invention, as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and constitute a part of

this specification, illustrate the invention and together with the description, serve to

explain the principles of the invention. In the drawings,

Fig. 1 is a high level diagram of an exemplary computing system network;

Fig. 2 is a more detailed diagram of an exemplary computer system associated

with the present invention;

Fig. 3 is a high level diagram of processes run by a server in the exemplary

computing system network;

Fig. 4 is a schematic diagram of an exemplary page context generation engine

operating in accordance with the present invention; and

Fig. 5 is a schematic diagram of an exemplary intelligence vortex component

operating in accordance with the present invention.

DETAILED DESCRIPTION

Reference will now be made in detail to the present disclosure of the invention, an

example of which is illustrated in the accompanying drawings. Wherever possible, the

same reference numbers will be used throughout the drawings to refer to the same or like

parts.

A user interacts with a local device called a client. For simplicity, this user/client

combination will be referred to as a user or a client herein. The client accesses a content serving site operating consistent with the present invention and information from this

access is passively gathered by the content serving site. The information obtained from

the client may be used to intelligently select content, such as advertisements, to include

with the content accessed by the client.

More particularly, when a client initially arrives at the content serving site, the

server queries the client to determine if an anonymous identifier, such as a cookie or

certificate, has been previously stored with the client. If so, the information in the

anonymous identifier is used to identify a corresponding entry in a database. If not, the

client is assigned a unique anonymous identifier and an entry is created in a database.

Anonymity is a key concern of the present invention out of reasonable concern for

privacy of users of the system. Privacy is ensured by only passively looking to

information provided by the client with the request for information from the content

serving to determine the client's behavior. Also, if placement of the anonymous

identifier is not permitted on the client side, that information is not provided to the

system. Nevertheless, a less responsible party may choose to forego the safeguards

provided by the system.

As the client moves around the content serving site (e.g., as the client requests

various web pages from the site or disconnects from the content serving site), the client's

profile is updated in the database based on the clients actions.

In the subsequent explanation of the invention, a preferred embodiment related to

a user browsing on the Internet will be described merely for convenience and by way of

example. As will be readily apparent to a person having skill in the area, the present invention is not so limited to browsing on the Internet. For example, the invention can be

applied to any information presentation in wide spread area demographic situations.

Fig. 1 is a high level diagram of an exemplary computing system network on

which the present invention may be implemented. The system includes a server 102

storing content information, such as web pages or downloadable files, and client

computers 106 capable of accessing the content information on server 102 through

network 104. Network 104 may be, for example, the Internet or a corporate Intranet.

Server 102 may be any of a number of known computers, or network of computers,

capable of delivering information to clients 106 over network 104. Similarly, clients 106

may be any of a number of known computers, or network of computers, capable of

requesting information from server 102.

To retrieve content information, such as a web page, stored on server 102, the user

of client 106 specifies a URL (uniform resource locator). The specified URL allows

software running on client 106, e.g., browsing software such as Netscape Corporation's

Navigator (TM) or Microsoft Corporation's Internet Explorer (TM), to initiate

communication with server 102 and access the desired content (web page), which the

client software interprets and provides to client 106. For example, the desired content can

be displayed on a CRT display.

Fig. 2 is a more detailed diagram of a computer system 200, which may be client

106 or server 102. Computer system 200 includes a processor 202 and a memory 204

coupled to processor 202 through a bus 206. Processor 202 fetches computer instructions

from memory 204 and executes those instructions. Processor 202 also (1) reads data

from and writes data to memory 204, (2) sends data and control signals through bus 206 to one or more computer output devices 220, (3) receives data and control signals through

bus 206 from one or more computer input devices 230 in accordance with the computer

instructions, and (4) transmits and receives data through bus 206 and router 225 to

network 104.

Memory 204 can include any type of computer memory including, without

limitation, random access memory (RAM), read-only memory (ROM), and storage

devices that include storage media such as magnetic and/or optical disks. Memory 204

includes a computer process 210, such as a web browser or web server software. A

computer process includes a collection of computer instructions and data that collectively

define a task performed by computer system 200.

Computer output devices 220 can include any type of computer output device,

such as a printer 224, a cathode ray tube (CRT) 222 (alternatively called a monitor or

display), a light-emitting diode (LED) display, or a liquid crystal display (LCD). CRT

display 222 preferably displays the graphical and textual information of the web browser.

Each of computer output devices 220 receives from processor 202 control signals and

data and, in response to such control signals, displays the received data.

User input devices 230 can include any type of user input device such as a

keyboard 232, a keypad (not shown), a pointing device, such as an electronic mouse 134,

a trackball (not shown), a lightpen (not shown), a touch-sensitive pad (not shown), a

digitizing tablet (not shown), thumb wheels (not shown), or a joystick (not shown). Each

of user input devices 230 generates signals in response to physical manipulation by a user

and transmits those signals through bus 206 to processor 202. The process begins with the user's first contact with a content serving site that

employs a user profiler 110 consistent with the present invention. After this first contact,

user profiler 110 receives information related to the content serving site accessed by

client 106 and the client's request of information from the content serving site. With this

information, user profiler 110 generates a profile of client 106, to which it intelligently

matches targeted content information, such as banner ads, to also provide to client 106.

The previously mentioned anonymous identifier is used to identify the client for future

profiling.

User profiler 110 begins by gathering initial basic data from the client request,

including client 106 and server 102 information, and information for the Internet Hyper

Text Transfer Protocol (HTTP) searches and links. Such information can include date,

gateway interface, HTTP image formats accepted, HTTP character sets accepted, HTTP

encoding accepted, HTTP language accepted, the URL of the site accessed, the page

which referred to user to the current page, the HTTP user agent, path information for

executables on the server, a query string, the user's IP address, the user's host, the method

used to access the desired page, the script used in accessing the page, and the site server's

name and port.

To accomplish this, a process in server 102 hosting the content serving site traps

the basic data associated with the client's actions in a manner that is transparent to the

client. If an anonymous identifier has not been created, server 102 creates a unique,

anonymous identifier for the client and attempts to store the anonymous identifier on the

client side. To relieve the system from having to generate a unique number, the initial

basic data can be used to generate the unique identifier. For example, if information concerning a user's Ethemet ID is provided, this is a guaranteed unique number until the

year 2002. Alternatively, the system can combine data from a portion of the user's IP

address with the current date and time to generate the unique number.

The process operating on server 102 then creates a packet containing the unique

identifier and the basic data and sends the packet to a server 107 operating as part of user

profiler 110.

Server 107 preferably receives the packet as part of an HTTP URL on port 777 in

a GET-Form format. Other formats can be used, based on appropriate engineering

tradeoffs. For example, the GET-format is limited in length and does not require

handshaking, and the POST method lengthens the time required to transfer information.

A sample GET-format of the URL is as follows:

http:. /www.gnosis.com:777?l=12345+D=2000:1231:2359+S=www.gte.net+ K=travel+L=San Francisco, CA, USA+U=www.yahoo.com.

With this format, which can be expanded up to the maximum character limit for a GET-

format URL (presently 2048 characters), 1 stands for anonymous identifier of the Client,

e.g. 1=12345, D stands for the date and time of transfer, e.g. D=2000: 1231:2359, S stands

for the source server 102, e.g. S=www.gte.net or an IP address, K stands for a keyword,

e.g. K=travel, L stands for the location of the client gateway, e.g. L=San Francisco, CA,

USA, and U stands for linked URL, e.g U=www.yahoo.com or IP address. The U

parameter is last in each aspect of the preferred embodiment.

Once the packet is sent, server 102 operates without regard to success of the

arrival of the packet at server 107. A precise transfer is not required in the preferred embodiment, because additional data might not serve to increase the profiling of the user

parameters that much.

New packets can be generated and sent to update the user profile in real time as

the user performs various browsing operations, e.g., a new search, a disconnect, and

following a link embedded in the current page.

Server 107 includes a daemon 301, parse and store threads 302, a targeting

intelligence engine (see Figs. 4-5), and a database 303, of which elements 301-303 are

schematically illustrated in Fig. 3.

Daemon 301 is a program that continually monitors processes port 777. A sample

Java program that demonstrates the concept follows.

import java. io.*; importjava.net.*;

/** *This program camps on port 777 waiting for inbound udp xfers

*Usage as above

1. URL specs must always be last

2. quotes not included in the udp call 3. Total call size limited to 256 Characters

public class echo777 {

OutputStream out = null; public static void main(String args []) { try{ //port number, 777 or first parameter int port = 777; if (args. length = 1) port = lnteger.parselnt(args[0]);

ServerSocket ss = new ServerSocket(port);//open a socket

//wait in infinite loop for inbound xfers int transaction = 0; while (true) { transaction-H-; Socket client = ss.accept(); Buffered Reader in = new BufferedReader(new

InputStreamReader(client.getinputStream())); Print Writer echo_out= new Print Writer(new OutputStreamWriter(client.getOutputStream())); echo_out.println("HTTP/l .0 200"); echo_out.println("Content-Type: text/plain"); echo_out.println(); echo_out.println();

System.out.print("transaction "+transaction+" "); System.out.println("CONNECTION TO PORT 777

SUCCESSFUL"); echo_out.flush(); String line; while((line = in.readLine()) !=null) { if(line.length() = 0) break; echo_out.println(line); }//line

//close streams echo_out.close(); in.close(); client.close();

System.out.println("end of echo message"); System, out . flush() ; System.gc(); if (transaction > 2) break;

} ///while true }//try catch (Exception e) { System, err .println(e) ; System.err.println("usage violated - daemon777 <optional port>");

}// catch }//main {//echo777

The primary function of server 107 is to respond as quickly as possible to the

datagram in the URL and not to do anything with that datagram. A function of daemon 301 is to initiate timed parse and store threads 302. Each

instance of a datagram passed to the daemon will spawn one of these threads.

A function of each parse and store thread 302 is to collect the information

received by the daemon, parse the information to make sense of it, link up the parsed

information for storage in the database 303, and terminate after completing its mission. If

a parse and store thread cannot complete its mission in a predetermined time, the thread

302 is terminated automatically. Each thread 302 then passes the parsed information to

database 303.

Database 303 stores the parsed information. Preferably, database 303 is an Oracle

database, because it is a real-time, highly scalable, multi-access database with known

functionality. Other databases could be used, e.g., ODI Persistent Object Store.

Once the information is stored in the database 303, it can be analyzed. The focus

of the analysis can be to determine what content a client 106 visiting the content serving

site on the server 102 is interested.

The stored information can be provided to Internet service providers (ISPs) to

provide better service to their clients. A site owner could use the information to improve

the content of the site. Additionally, the information regarding client preferences can be

used to alter the content of the information provided to the client 106 in real time.

If the information is provided to the site owner or ISP, the information does not

require a real-time response, so the process can be performed off-line or in a background

mode reniced to a low priority. The process does, however, require the intelligence of a

methodology to determine categories of client interest from the information stored in

database 303. For example, based on the URL references, or reverse URL references, keywords

can be determined that can be correlated to categories of client interest.

To examine a URL, a sample Java program follows:

import Java. io.*; importjava.net.*;

/**

*This simple program uses the URL class and its openStream() method to

*download the contents of a URL and copy them to a file or to the console.

**/ public class geturl { public static void main(String[] args) { InputStream in = null; OutputStream out = null; try{ //check the arguments if ((args.length !=1) && (args.length !=2)) throw new IllegalArgumentException("Wrong number of Arguments":

//Setup the Streams, in & out URL url = new URL(args[0]); in = url.openStream(); if (args.length = = 2) out = new FileOutputStream(args[l]); else out = System.out; //Now copy bytes from the URL to the output stream byte[]buffer = new byte[4096]; int bytes read; while((bytes_read = in.read(buffer)) != -1) out.write(buffer, 0, bytes_read); }//try

//On exceptions, print error message and usage indicator catch (Exception e) { System.err.println(e);

System.err.ρrintln("Usage: Java GetURL <URL> [<filename>}"); }//catch finally {// always close the streams, no matter what! try {in.closeO; out.close();} catch (Exception e) {} } //finally }//main }//GetURL This program obtains the URL information of the stored URL address, which can

then be scanned to find metatags placed for search engines as well as the text of the body

of the URL. This can then be indexed by both the anonymous identifier and the server

102, and resolved in the database with any keyword parameters already stored for that

anonymous identifier and server.

To provide the targeting information to the site owner, the database is searched

with respect to the content of server 102 and the keywords found in the URLs. The

results are to determine appropriate categories of interest, and those categories are relayed

to the site owner. The reverse site URLs, which make reference to the site, can also be

provided to give the site owner a list of potential advertisers. The information can be

vector matched in a similar manner to be described subsequently with respect to the

provision of real-time target information. Therefore, a server 102 employing the process

can improve their cost per thousand of impressions (CPM), and improve their percent

usage of real estate.

Vectored analysis of the client information is performed by the targeting

intelligence engine. Figs. 4 and 5 illustrate the main components of the targeting

intelligence engine, which determines psycho-graphic personality preferences using the

data gathered and stored in database 303. The targeting intelligence engine is designed to

give confidence intervals about a user's sex, age, and geographical location, for example.

In preparation for the transmission of targeted content from, e.g., Intelligent

Information Systems' Netgravity Advertisement server to client 106, context information

must be generated for each page of the content serving site. The generated context

information is stored in a database 404, for example, a relational database management system, such as Oracle. Additionally, context information for each potential page from

which the visitor can be referred is be generated and stored in database 404.

As shown in Fig. 4, a page context generation engine 400 runs a context

application, e.g. an Oracle ConText application, against all of the pages of the content

serving sites that subscribe to the system to generate the context for each of those pages.

Oracle ConText is preferred because of its ability to determine the gist or theme of a

document on the web using standard SQL commands. Nevertheless, any software that

can read a document and determine its context would be acceptable. Context generation

is also performed against all pages on the Internet that have links to the pages of the

subscribing content serving sites. In Fig. 4, a generic server having a content serving site

is indicated at 402. Generated context information for all pages is stored in database 404

and is indexed by page URL.

Page context generation engine 400 is fully automated, and only requires the

URL's of the pages on which the targeted content will be served. The URL's can be

entered by an operator using a Java applet from a web browser interface. When a URL is

entered, the web page is extracted and run through the HTML filter, which removes

HTML tags and non-essential data. The resulting page content is passed to context

application 401 which generates and stores the page context vectors, indexed by URL.

In addition to context vectors for each page with which targeted content will be

provided, all potential referring pages are extracted by a URL robot (a.k.a. a spider) to

locate URLS which contain links to the host page. This can be performed by, e.g.,

database maintenance robot 405. In a similar manner as the host page, each referring page is filtered and passed to context application 401, which generates and stores the

page context vectors as a database record, indexed by URL.

After the context vectors for each page are determined, a page gender calculation

engine 403 develops a gender/value pair from the resulting context for each page. The

gender/value is stored in database 404, indexed by URL.

Figure 5 illustrates a preferred embodiment of an intelligence vortex component

500. When targeted content is requested of a content server 501, intelligence vortex

component 500, in real time, retrieves the stored context information from database 404

for both the page from the content serving site and the referring page.

Intelligence vortex component 500 includes a group of tasks used to determine the

most appropriate targeted content for a particular client 106 using information developed

from database 303 in user profiler 110.

A visitor gender analysis engine 504 extracts gender/value pairs stored in database

404 associated with client 106. An algorithm is applied using the weighted value of each

page gender, resulting in a more accurate determination of the visitor gender than could

be arrived at using a single gender datapoint. For example, the past, present, and to be

accessed pages could be used to calculate the gender. If client 106 is one who has been

previously assigned an anonymous identifier, all URLs associated with that client 106 can

be used to increase the accuracy of visitor gender analysis engine 504. If the client's

gender is already known and stored in database 303, then this process can be avoided.

A visitor location component 505 uses client location information from database

303 or estimates a likely location from the clients IP address provided from server 107 and information from a local Whols(TM) command server (GTE, the assignee of the

present invention, provides such a server).

A page context analysis engine 503 extracts the context data from database 404 of

the pages associated with client 106. An algorithm is applied that determines content that

the client 106 is likely to find preferable. For example, the context of past, present, and

to be accessed pages could be used to calculate the client preference. If client 106 is one

that has been previously assigned an anonymous identifier, all URL's associated with that

client 106 can be used to increase the accuracy of page context analysis engine 503.

A target content dimensions determination engine 502 uses the output of visitor

gender analysis engine 504, page context analysis engine 503, and visitor location

component 505 to develop target preferences in the context of the known targeted content

available at target content server 501. Similarly, an age analysis engine could also be

provided to determine target preferences. Then, the targeted content is provided to client

106 in real time.

A database maintenance robot 405 (see Fig. 4) runs continually in the background

at a low priority to ensure that database 404 does not contain cluttering, invalid records;

contains all valid pages; and does not contain improper context data. Database

maintenance robot 405 ensures that all records in database 404 are valid by successively

accessing the database by, e.g., walking the database, extracting a URL, and requesting

the page from the Internet using the URL. In the event that the page no longer exists

(Error 404), the record containing the URL is deleted from the database.

To ensure that all valid pages are in database 404, database maintenance robot 405

continually spiders the web, seeking new URL's which contain links to the banner host pages. If such a URL is found, it submits the web page for inclusion in the database 404,

after performing the above filtering and context generation as done for the other referring

pages.

To ensure that the context of each valid page is appropriate and current, database

maintenance robot 405 periodically updates the context data for each linking web page

and banner host page in database 404.

Other embodiments of the invention will be apparent to those skilled in the art

from consideration of the specification and practice of the invention disclosed herein. It

is intended that the specification and examples be considered as exemplary only, with a

true scope and spirit of the invention being indicated by the following claims.

Claims

WHAT IS CLAIMED:

1. A method of providing targeted information to a user of a network having

content information, comprising:

passively gathering parameters from a user's request of content information;

determining, from the parameters, a user profile; and

providing targeted information to the user based on the user profile.

2. A method according to claim 1, wherein the gathered parameters include

time parameters, user-side parameters, and network-side parameters.

3. A method according to claim 1, further comprising:

storing the gathered parameters in a database, wherein the determining step

includes using a history of gathered parameters to determine a user profile.

4. A method according to claim 1, wherein the determining step includes

determining a user's gender profile.

5. A method according to claim 1, wherein the determining step includes

determining a user's location profile.

6. A method according to claim 1, wherein the determining step includes

determining a user's preferences based on the content information.

7. A method of gathering user data, comprising:

receiving a user's request of content information from a first server;

determining parameters from the user's request; and

sending the parameters to a second server for storing information regarding the

user.

8. A method according to claim 7, further comprising:

assigning an identifier for the user and causing the identifier to be stored on the

user's system.

9. A method according to claim 8, wherein the assigning step includes

sending the identifier from the first server to the second server.

10. A method according to claim 8, further comprising:

storing the parameters in a database indexed by the identifier.

11. A method according to claim 7, wherein the method operates in

association with the Internet, and wherein the sending step includes transmitting the

parameters in GET-format.

12. A database used in connection with content information on a network,

comprising:

information regarding an address of the content information; and information regarding a plurality of addresses containing references to the address

of content information.

13. A database according to claim 12, wherein the information regarding an

address of the content information includes a context of the content information.

14. A database according to claim 12, wherein the information regarding the

plurality of addresses includes a context of information associated with the plurality of

addresses.

15. A database according to claim 12, wherein the information regarding an

address of the content information includes the gender value pairs.

16. A database according to claim 15, wherein information regarding the

plurality of addresses includes gender value pairs associated with the plurality of

addresses.

17. A method of providing targeted information to a user of a network having

content information using a database comprising:

passively gathering parameters from a user's request of content information;

determining, from the parameters and information stored in the database, a user

profile; and

providing targeted information to the user based on the user profile.

18. An apparatus which provides targeted information to a user of a network

having content information, comprising:

means for passively gathering parameters from a user's request of content

information;

means for determining, from the parameters, a user profile; and

means for providing targeted information to the user based on the user profile.

19. An apparatus according to claim 18, wherein the gathered parameters

include time parameters, user-side parameters, and network-side parameters.

20. An apparatus according to claim 18, further comprising:

a database which stores the gathered parameters, wherein the user profile is

determined using a history of gathered parameters.

21. An apparatus according to claim 18, wherein the means for determining

determines a users gender profile.

22. An apparatus according to claim 18, wherein the means for determining

determines a user's location profile.

23. An apparatus according to claim 18, wherein the means for

determining determines a user's preferences based on the content information.

24. An apparatus according to claim 18, wherein the network is the Internet.

25. An apparatus according to claim 18, wherein the network is an Intranet.

26. An apparatus which gathers user data, comprising:

a first server which for receiving a user's request of content information;

means for determining parameters from the user's request; and

means for sending the parameters to a second server for storing information

regarding the user.

27. An apparatus according to claim 26, further comprising:

means for assigning an identifier for the user and causing the identifier to be

stored on the user's system.

28. An apparatus according to claim 27, wherein the means for assigning

includes sending the identifier from the first server to the second server.

29. An apparatus according to claim 27, further comprising:

storing the parameters in a database indexed by the identifier.

30. An apparatus according to claim 26, wherein the method operates in

association with the Internet, sending includes transmission of the parameters in GET-

format.