TARGETED ADVERTISING SYSTEM
CROSS REFERENCE TO RELATED APPLICATION
This application claims the benefit of United States Provisional Patent
Application No. 60/146,955 filed on August 3, 1999.
BACKGROUND OF THE INVENTION
Field of the Invention
The present invention relates to a system for providing users with targeted
information, e.g., an advertisement, based on a profile of the user's preferences.
Description of the Related Art
The Internet excels at cheaply delivering information to a wide audience. Internet
sites that compile and make the information available to users, often called "content"
sites, are rapidly becoming an important element in national commerce. One way these
sites generate income is by charging advertisers to display advertisements in, e.g., a
banner, when their site is accessed. Users reading content at the site are exposed to the
advertisements.
To maximize the efficiency of the individual ads, it is desirable to tailor the
advertisements to the particular individual user. One conventional way to tailor
advertisements to a particular user is to have the user initially fill out a questionnaire
relating to the user's hobbies, demographic information, employment information, etc.
This method of obtaining information about the user, however, is burdensome to the user
and is limited to the questions in the questionnaire.
Thus, there is a need in the art to be able to obtain profile information specific to a user without having to explicitly question the user.
SUMMARY OF THE INVENTION
Systems and methods consistent with the present invention overcome the
shortcomings of the prior art by passively obtaining profile information regarding
individual users of a network.
Additional objects and advantages of the invention will be set forth in part in the
description which follows, and in part will be obvious from the description, or may be
learned by practice of the invention. The objects and advantages of the invention will be
realized and attained by means of the elements and combinations particularly pointed out
in the appended claims.
In accordance with the purpose of the invention, as embodied and broadly
described herein, systems and methods consistent with the invention target information to
a user of a network having content information by passively gathering parameters from a
user's request of content information, determining, from the parameters, a user profile,
and providing targeted information to the user based on the user profile.
In accordance with another aspect of the invention, systems and methods
consistent with the invention receive a user's request of content information from a first
server, determine parameters from the user's request, and send the parameters to a second
server for storing information regarding the user. Such systems and methods also provide
a database, used in connection with content information on a network, including
information regarding an address of the content information and information regarding a
plurality of addresses containing references to the address of content information.
It is to be understood that both the foregoing general description and the following
detailed description are exemplary and explanatory only and are not restrictive of the
invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and constitute a part of
this specification, illustrate the invention and together with the description, serve to
explain the principles of the invention. In the drawings,
Fig. 1 is a high level diagram of an exemplary computing system network;
Fig. 2 is a more detailed diagram of an exemplary computer system associated
with the present invention;
Fig. 3 is a high level diagram of processes run by a server in the exemplary
computing system network;
Fig. 4 is a schematic diagram of an exemplary page context generation engine
operating in accordance with the present invention; and
Fig. 5 is a schematic diagram of an exemplary intelligence vortex component
operating in accordance with the present invention.
DETAILED DESCRIPTION
Reference will now be made in detail to the present disclosure of the invention, an
example of which is illustrated in the accompanying drawings. Wherever possible, the
same reference numbers will be used throughout the drawings to refer to the same or like
parts.
A user interacts with a local device called a client. For simplicity, this user/client
combination will be referred to as a user or a client herein. The client accesses a content
serving site operating consistent with the present invention and information from this
access is passively gathered by the content serving site. The information obtained from
the client may be used to intelligently select content, such as advertisements, to include
with the content accessed by the client.
More particularly, when a client initially arrives at the content serving site, the
server queries the client to determine if an anonymous identifier, such as a cookie or
certificate, has been previously stored with the client. If so, the information in the
anonymous identifier is used to identify a corresponding entry in a database. If not, the
client is assigned a unique anonymous identifier and an entry is created in a database.
Anonymity is a key concern of the present invention out of reasonable concern for
privacy of users of the system. Privacy is ensured by only passively looking to
information provided by the client with the request for information from the content
serving to determine the client's behavior. Also, if placement of the anonymous
identifier is not permitted on the client side, that information is not provided to the
system. Nevertheless, a less responsible party may choose to forego the safeguards
provided by the system.
As the client moves around the content serving site (e.g., as the client requests
various web pages from the site or disconnects from the content serving site), the client's
profile is updated in the database based on the clients actions.
In the subsequent explanation of the invention, a preferred embodiment related to
a user browsing on the Internet will be described merely for convenience and by way of
example. As will be readily apparent to a person having skill in the area, the present
invention is not so limited to browsing on the Internet. For example, the invention can be
applied to any information presentation in wide spread area demographic situations.
Fig. 1 is a high level diagram of an exemplary computing system network on
which the present invention may be implemented. The system includes a server 102
storing content information, such as web pages or downloadable files, and client
computers 106 capable of accessing the content information on server 102 through
network 104. Network 104 may be, for example, the Internet or a corporate Intranet.
Server 102 may be any of a number of known computers, or network of computers,
capable of delivering information to clients 106 over network 104. Similarly, clients 106
may be any of a number of known computers, or network of computers, capable of
requesting information from server 102.
To retrieve content information, such as a web page, stored on server 102, the user
of client 106 specifies a URL (uniform resource locator). The specified URL allows
software running on client 106, e.g., browsing software such as Netscape Corporation's
Navigator (TM) or Microsoft Corporation's Internet Explorer (TM), to initiate
communication with server 102 and access the desired content (web page), which the
client software interprets and provides to client 106. For example, the desired content can
be displayed on a CRT display.
Fig. 2 is a more detailed diagram of a computer system 200, which may be client
106 or server 102. Computer system 200 includes a processor 202 and a memory 204
coupled to processor 202 through a bus 206. Processor 202 fetches computer instructions
from memory 204 and executes those instructions. Processor 202 also (1) reads data
from and writes data to memory 204, (2) sends data and control signals through bus 206
to one or more computer output devices 220, (3) receives data and control signals through
bus 206 from one or more computer input devices 230 in accordance with the computer
instructions, and (4) transmits and receives data through bus 206 and router 225 to
network 104.
Memory 204 can include any type of computer memory including, without
limitation, random access memory (RAM), read-only memory (ROM), and storage
devices that include storage media such as magnetic and/or optical disks. Memory 204
includes a computer process 210, such as a web browser or web server software. A
computer process includes a collection of computer instructions and data that collectively
define a task performed by computer system 200.
Computer output devices 220 can include any type of computer output device,
such as a printer 224, a cathode ray tube (CRT) 222 (alternatively called a monitor or
display), a light-emitting diode (LED) display, or a liquid crystal display (LCD). CRT
display 222 preferably displays the graphical and textual information of the web browser.
Each of computer output devices 220 receives from processor 202 control signals and
data and, in response to such control signals, displays the received data.
User input devices 230 can include any type of user input device such as a
keyboard 232, a keypad (not shown), a pointing device, such as an electronic mouse 134,
a trackball (not shown), a lightpen (not shown), a touch-sensitive pad (not shown), a
digitizing tablet (not shown), thumb wheels (not shown), or a joystick (not shown). Each
of user input devices 230 generates signals in response to physical manipulation by a user
and transmits those signals through bus 206 to processor 202.
The process begins with the user's first contact with a content serving site that
employs a user profiler 110 consistent with the present invention. After this first contact,
user profiler 110 receives information related to the content serving site accessed by
client 106 and the client's request of information from the content serving site. With this
information, user profiler 110 generates a profile of client 106, to which it intelligently
matches targeted content information, such as banner ads, to also provide to client 106.
The previously mentioned anonymous identifier is used to identify the client for future
profiling.
User profiler 110 begins by gathering initial basic data from the client request,
including client 106 and server 102 information, and information for the Internet Hyper
Text Transfer Protocol (HTTP) searches and links. Such information can include date,
gateway interface, HTTP image formats accepted, HTTP character sets accepted, HTTP
encoding accepted, HTTP language accepted, the URL of the site accessed, the page
which referred to user to the current page, the HTTP user agent, path information for
executables on the server, a query string, the user's IP address, the user's host, the method
used to access the desired page, the script used in accessing the page, and the site server's
name and port.
To accomplish this, a process in server 102 hosting the content serving site traps
the basic data associated with the client's actions in a manner that is transparent to the
client. If an anonymous identifier has not been created, server 102 creates a unique,
anonymous identifier for the client and attempts to store the anonymous identifier on the
client side. To relieve the system from having to generate a unique number, the initial
basic data can be used to generate the unique identifier. For example, if information
concerning a user's Ethemet ID is provided, this is a guaranteed unique number until the
year 2002. Alternatively, the system can combine data from a portion of the user's IP
address with the current date and time to generate the unique number.
The process operating on server 102 then creates a packet containing the unique
identifier and the basic data and sends the packet to a server 107 operating as part of user
profiler 110.
Server 107 preferably receives the packet as part of an HTTP URL on port 777 in
a GET-Form format. Other formats can be used, based on appropriate engineering
tradeoffs. For example, the GET-format is limited in length and does not require
handshaking, and the POST method lengthens the time required to transfer information.
A sample GET-format of the URL is as follows:
http:. /www.gnosis.com:777?l=12345+D=2000:1231:2359+S=www.gte.net+ K=travel+L=San Francisco, CA, USA+U=www.yahoo.com.
With this format, which can be expanded up to the maximum character limit for a GET-
format URL (presently 2048 characters), 1 stands for anonymous identifier of the Client,
e.g. 1=12345, D stands for the date and time of transfer, e.g. D=2000: 1231:2359, S stands
for the source server 102, e.g. S=www.gte.net or an IP address, K stands for a keyword,
e.g. K=travel, L stands for the location of the client gateway, e.g. L=San Francisco, CA,
USA, and U stands for linked URL, e.g U=www.yahoo.com or IP address. The U
parameter is last in each aspect of the preferred embodiment.
Once the packet is sent, server 102 operates without regard to success of the
arrival of the packet at server 107. A precise transfer is not required in the preferred
embodiment, because additional data might not serve to increase the profiling of the user
parameters that much.
New packets can be generated and sent to update the user profile in real time as
the user performs various browsing operations, e.g., a new search, a disconnect, and
following a link embedded in the current page.
Server 107 includes a daemon 301, parse and store threads 302, a targeting
intelligence engine (see Figs. 4-5), and a database 303, of which elements 301-303 are
schematically illustrated in Fig. 3.
Daemon 301 is a program that continually monitors processes port 777. A sample
Java program that demonstrates the concept follows.
import java. io.*; importjava.net.*;
/** *This program camps on port 777 waiting for inbound udp xfers
*Usage as above
1. URL specs must always be last
2. quotes not included in the udp call 3. Total call size limited to 256 Characters
public class echo777 {
OutputStream out = null; public static void main(String args []) { try{ //port number, 777 or first parameter int port = 777; if (args. length = 1) port = lnteger.parselnt(args[0]);
ServerSocket ss = new ServerSocket(port);//open a socket
//wait in infinite loop for inbound xfers int transaction = 0;
while (true) { transaction-H-; Socket client = ss.accept(); Buffered Reader in = new BufferedReader(new
InputStreamReader(client.getinputStream())); Print Writer echo_out= new Print Writer(new OutputStreamWriter(client.getOutputStream())); echo_out.println("HTTP/l .0 200"); echo_out.println("Content-Type: text/plain"); echo_out.println(); echo_out.println();
System.out.print("transaction "+transaction+" "); System.out.println("CONNECTION TO PORT 777
SUCCESSFUL"); echo_out.flush(); String line; while((line = in.readLine()) !=null) { if(line.length() = 0) break; echo_out.println(line); }//line
//close streams echo_out.close(); in.close(); client.close();
System.out.println("end of echo message"); System, out . flush() ; System.gc(); if (transaction > 2) break;
} ///while true }//try catch (Exception e) { System, err .println(e) ; System.err.println("usage violated - daemon777 <optional port>");
}// catch }//main {//echo777
The primary function of server 107 is to respond as quickly as possible to the
datagram in the URL and not to do anything with that datagram.
A function of daemon 301 is to initiate timed parse and store threads 302. Each
instance of a datagram passed to the daemon will spawn one of these threads.
A function of each parse and store thread 302 is to collect the information
received by the daemon, parse the information to make sense of it, link up the parsed
information for storage in the database 303, and terminate after completing its mission. If
a parse and store thread cannot complete its mission in a predetermined time, the thread
302 is terminated automatically. Each thread 302 then passes the parsed information to
database 303.
Database 303 stores the parsed information. Preferably, database 303 is an Oracle
database, because it is a real-time, highly scalable, multi-access database with known
functionality. Other databases could be used, e.g., ODI Persistent Object Store.
Once the information is stored in the database 303, it can be analyzed. The focus
of the analysis can be to determine what content a client 106 visiting the content serving
site on the server 102 is interested.
The stored information can be provided to Internet service providers (ISPs) to
provide better service to their clients. A site owner could use the information to improve
the content of the site. Additionally, the information regarding client preferences can be
used to alter the content of the information provided to the client 106 in real time.
If the information is provided to the site owner or ISP, the information does not
require a real-time response, so the process can be performed off-line or in a background
mode reniced to a low priority. The process does, however, require the intelligence of a
methodology to determine categories of client interest from the information stored in
database 303.
For example, based on the URL references, or reverse URL references, keywords
can be determined that can be correlated to categories of client interest.
To examine a URL, a sample Java program follows:
import Java. io.*; importjava.net.*;
/**
*This simple program uses the URL class and its openStream() method to
*download the contents of a URL and copy them to a file or to the console.
**/ public class geturl { public static void main(String[] args) { InputStream in = null; OutputStream out = null; try{ //check the arguments if ((args.length !=1) && (args.length !=2)) throw new IllegalArgumentException("Wrong number of Arguments":
//Setup the Streams, in & out URL url = new URL(args[0]); in = url.openStream(); if (args.length = = 2) out = new FileOutputStream(args[l]); else out = System.out; //Now copy bytes from the URL to the output stream byte[]buffer = new byte[4096]; int bytes read; while((bytes_read = in.read(buffer)) != -1) out.write(buffer, 0, bytes_read); }//try
//On exceptions, print error message and usage indicator catch (Exception e) { System.err.println(e);
System.err.ρrintln("Usage: Java GetURL <URL> [<filename>}"); }//catch finally {// always close the streams, no matter what! try {in.closeO; out.close();} catch (Exception e) {} } //finally }//main }//GetURL
This program obtains the URL information of the stored URL address, which can
then be scanned to find metatags placed for search engines as well as the text of the body
of the URL. This can then be indexed by both the anonymous identifier and the server
102, and resolved in the database with any keyword parameters already stored for that
anonymous identifier and server.
To provide the targeting information to the site owner, the database is searched
with respect to the content of server 102 and the keywords found in the URLs. The
results are to determine appropriate categories of interest, and those categories are relayed
to the site owner. The reverse site URLs, which make reference to the site, can also be
provided to give the site owner a list of potential advertisers. The information can be
vector matched in a similar manner to be described subsequently with respect to the
provision of real-time target information. Therefore, a server 102 employing the process
can improve their cost per thousand of impressions (CPM), and improve their percent
usage of real estate.
Vectored analysis of the client information is performed by the targeting
intelligence engine. Figs. 4 and 5 illustrate the main components of the targeting
intelligence engine, which determines psycho-graphic personality preferences using the
data gathered and stored in database 303. The targeting intelligence engine is designed to
give confidence intervals about a user's sex, age, and geographical location, for example.
In preparation for the transmission of targeted content from, e.g., Intelligent
Information Systems' Netgravity Advertisement server to client 106, context information
must be generated for each page of the content serving site. The generated context
information is stored in a database 404, for example, a relational database management
system, such as Oracle. Additionally, context information for each potential page from
which the visitor can be referred is be generated and stored in database 404.
As shown in Fig. 4, a page context generation engine 400 runs a context
application, e.g. an Oracle ConText application, against all of the pages of the content
serving sites that subscribe to the system to generate the context for each of those pages.
Oracle ConText is preferred because of its ability to determine the gist or theme of a
document on the web using standard SQL commands. Nevertheless, any software that
can read a document and determine its context would be acceptable. Context generation
is also performed against all pages on the Internet that have links to the pages of the
subscribing content serving sites. In Fig. 4, a generic server having a content serving site
is indicated at 402. Generated context information for all pages is stored in database 404
and is indexed by page URL.
Page context generation engine 400 is fully automated, and only requires the
URL's of the pages on which the targeted content will be served. The URL's can be
entered by an operator using a Java applet from a web browser interface. When a URL is
entered, the web page is extracted and run through the HTML filter, which removes
HTML tags and non-essential data. The resulting page content is passed to context
application 401 which generates and stores the page context vectors, indexed by URL.
In addition to context vectors for each page with which targeted content will be
provided, all potential referring pages are extracted by a URL robot (a.k.a. a spider) to
locate URLS which contain links to the host page. This can be performed by, e.g.,
database maintenance robot 405. In a similar manner as the host page, each referring
page is filtered and passed to context application 401, which generates and stores the
page context vectors as a database record, indexed by URL.
After the context vectors for each page are determined, a page gender calculation
engine 403 develops a gender/value pair from the resulting context for each page. The
gender/value is stored in database 404, indexed by URL.
Figure 5 illustrates a preferred embodiment of an intelligence vortex component
500. When targeted content is requested of a content server 501, intelligence vortex
component 500, in real time, retrieves the stored context information from database 404
for both the page from the content serving site and the referring page.
Intelligence vortex component 500 includes a group of tasks used to determine the
most appropriate targeted content for a particular client 106 using information developed
from database 303 in user profiler 110.
A visitor gender analysis engine 504 extracts gender/value pairs stored in database
404 associated with client 106. An algorithm is applied using the weighted value of each
page gender, resulting in a more accurate determination of the visitor gender than could
be arrived at using a single gender datapoint. For example, the past, present, and to be
accessed pages could be used to calculate the gender. If client 106 is one who has been
previously assigned an anonymous identifier, all URLs associated with that client 106 can
be used to increase the accuracy of visitor gender analysis engine 504. If the client's
gender is already known and stored in database 303, then this process can be avoided.
A visitor location component 505 uses client location information from database
303 or estimates a likely location from the clients IP address provided from server 107
and information from a local Whols(TM) command server (GTE, the assignee of the
present invention, provides such a server).
A page context analysis engine 503 extracts the context data from database 404 of
the pages associated with client 106. An algorithm is applied that determines content that
the client 106 is likely to find preferable. For example, the context of past, present, and
to be accessed pages could be used to calculate the client preference. If client 106 is one
that has been previously assigned an anonymous identifier, all URL's associated with that
client 106 can be used to increase the accuracy of page context analysis engine 503.
A target content dimensions determination engine 502 uses the output of visitor
gender analysis engine 504, page context analysis engine 503, and visitor location
component 505 to develop target preferences in the context of the known targeted content
available at target content server 501. Similarly, an age analysis engine could also be
provided to determine target preferences. Then, the targeted content is provided to client
106 in real time.
A database maintenance robot 405 (see Fig. 4) runs continually in the background
at a low priority to ensure that database 404 does not contain cluttering, invalid records;
contains all valid pages; and does not contain improper context data. Database
maintenance robot 405 ensures that all records in database 404 are valid by successively
accessing the database by, e.g., walking the database, extracting a URL, and requesting
the page from the Internet using the URL. In the event that the page no longer exists
(Error 404), the record containing the URL is deleted from the database.
To ensure that all valid pages are in database 404, database maintenance robot 405
continually spiders the web, seeking new URL's which contain links to the banner host
pages. If such a URL is found, it submits the web page for inclusion in the database 404,
after performing the above filtering and context generation as done for the other referring
pages.
To ensure that the context of each valid page is appropriate and current, database
maintenance robot 405 periodically updates the context data for each linking web page
and banner host page in database 404.
Other embodiments of the invention will be apparent to those skilled in the art
from consideration of the specification and practice of the invention disclosed herein. It
is intended that the specification and examples be considered as exemplary only, with a
true scope and spirit of the invention being indicated by the following claims.