Gazpacho and summer rash:
lexical relationships from temporal patterns of web search queries
Enrique Alfonseca
Massimiliano Ciaramita
Google
Zürich, Switzerland
Keith Hall
ealfonseca@google.com, massi@google.com, kbhall@google.com
Abstract
In this paper we investigate temporal patterns of web search queries. We carry out
several evaluations to analyze the properties of temporal profiles of queries, revealing promising semantic and pragmatic relationships between words. We focus on
two applications: query suggestion and
query categorization. The former shows
a potential for time-series similarity measures to identify specific semantic relatedness between words, which results in
state-of-the-art performance in query suggestion while providing complementary
information to more traditional distributional similarity measures. The query categorization evaluation suggests that the
temporal profile alone is not a strong indicator of broad topical categories.
1
Introduction
The temporal patterns of word occurrences in human communication carry an implicit measure of
their relationship to real-world events and behavioral patterns. For example, when there is an event
affecting a given entity (such as a natural disaster
in a country), the entity name will turn up more
frequently in human conversation, newswire articles and web documents; and people will search
for it more often. Two entities that are closely
related in the real world, such as the name of a
country and a prominent region inside the country are likely to share common events and therefore be closely associated in human communication. Finally, two instances of the same class
are also likely to share common usage patterns.
For example, names of airlines or retail stores are
more likely to be used by day rather than by night
(Chien, 2005).
In this paper we explore the linguistic relationship between phrases that are judged to be sim-
ilar based on their frequency time series correlation in search query logs. For every phrase1 available in WordNet 3.02 (Miller, 1995), we have obtained its temporal signature from query logs, and
calculated all their pairwise correlations. Next,
we study the relationship in the top-ranked pairs
with respect to their distribution in WordNet and a
human-annotated labelling.
We also discuss possible applications of this
data to solve open problems and present the results
of two experiments: one where time series correlations turned out to be highly discriminative; and
another where they were not particularly informative but shed some light on the nature of temporal
semantics and topical categorization:
• Query suggestion, i.e. given a query, generate
a ranked list of alternative queries in which
the user may be interested.
• Query categorization, i.e. given a predefined
set of categories, find the top categories to
which the query can be assigned.
Finally, we illustrate with an example another application of time series in solving information extraction problems.
Although query logs are typically proprietary
data, there are ongoing initiatives, like the Lemur
toolbar3 , which make this kind of information
available for research purposes. Other work
(Bansal and Koudas, 2007b; Bansal and Koudas,
2007a) shows that temporal information can also
be extracted from public data, such as blogs. More
traditional types of text, such as news, are also typically associated with temporal labels; e.g., dates
and timestamps.
This paper is structured in the following way:
1
We use the term phrase to refer to any single word or
multi-word expression that belongs to a synset in WordNet.
Examples of phrases are person, causal entity or william
shakespeare. We focused on the nouns hierarchy only.
2
http://wordnet.princeton.edu
3
http://www.lemurproject.org/
querylogtoolbar/
Section 2 summarizes the related work. Section 3
describes the correlation analysis between all pairs
of phrases from WordNet. Next, Section 4 describes the application to query suggestion, and
Section 5 the application to labelling queries in
topical categories. Section 7 summarizes the conclusions and outlines ideas for future research.
2
Related work
The study of query time series explores a particular instance of the so-called wisdom of the crowds
effect. Within this area, we can distinguish two
kinds of phenomena. Knowledge and resources
assembled by people explicitly, either individually, such as the case of blogs, or in a collaborative way, as in forums or wikis. These resources
are valuable for human-consumption and can also
be exploited in order to learn computational resources (Medelyan et al., 2008; Weld et al., 2008;
Zesch et al., 2008b; Zesch et al., 2008a). On
the other hand, it is possible to acquire useful resources and knowledge from aggregating behavioral patterns of large groups of people, even in
the absence of a conscious effort. There is extensive ongoing research on the use of web search
usage patterns to develop knowledge resources.
Some examples are clustering co-click patterns to
learn semantically related queries (Beeferman and
Berger, 2000), combining co-click patterns with
hitting times (Mei et al., 2008), analyzing query
revisions made by users when querying search engines (Jones et al., 2006), replacing query words
with other words that have the highest pointwise
mutual information (Terra and Clarke, 2004), or
using the temporal distribution of words in documents to improve ranking of search results (Jones
and Diaz, 2007).
Within this second category, an important area
is dedicated to the study of time-related features
of search queries. News aggregators use real-time
frequencies of user queries to detect spikes and
identify news shortly after the spikes occur (Murata, 2008). Web users’ query patterns have also
proved useful for building a real-time surveillance
system that accurately estimates region-by-region
influenza activity with a lag of one day (Ginsberg
et al., 2009). Search engines specifically developed for real-time searches, like Twitter search,
will most likely provide new use cases and scenarios for quickly detecting trends in user search
query patterns.
Figure 1: Time series obtained for the queries
[gazpacho] and [summertime] (normalized
scales).
Our study builds upon the work of Chien
(2005), who observed that queries with highlycorrelated temporal usage patterns are typically
semantically related, and described a procedure
for calculating the correlations efficiently. We
have extended the analysis described in this work,
by performing a more extensive evaluation of the
kinds of semantic relationships that we can find
among temporally-similar queries. We also propose, to our knowledge for the first time, areas
of applications in solving well-established problems which shed some light on the nature of timebased semantic similarity. This work is also related to the analysis of temporal properties of
information streams in data mining (Kleinberg,
2006) and information retrieval from time series
databases (Agrawal et al., 1993).
3
Time-based similarities between
phrases
Similarly to the method described in Chien (2005),
we take a time interval, divide it into equally
spaced subintervals, and represent each phrase of
interest as the sequence of frequencies with which
the phrase was observed in the subintervals. In
our experiments, we have used as source data
the set of fully anonymized query logs from the
Google search engine between January 1st, 2004
and March 1st, 2009.4 .
These data have been aggregated on a daily basis so that we have the daily frequency of the
4
Part of this data is publicly available from http://
www.google.com/trends
queries of interest for over five years. The frequencies are then normalized with the total number of
queries that happened on that day. The normalization is necessary to avoid daily and seasonal variations as there are typically more queries on weekdays than on weekends and fewer queries during
holiday seasons than in the rest of the year. It
also helps reducing the effect deriving from the
fact that the population with Internet access is still
monotonically growing, so we can expect that the
number of queries will become higher and higher
over time.
Given two phrases and their associated time series, the similarity metric used is the correlation
coefficient between the two series (Chien, 2005).
For illustration, Figure 1 shows the time series obtained for two sample queries, gazpacho and summertime, whose time series yield a correlation of
0.92. Similar high correlations can be observed
with other queries related to phenomena that occur mainly in summer in the countries from which
most queries come, like summer rash.
3.1 WordNet-based evaluation
In this section, we describe a study carried out
with the purpose of discovering the traditional
lexico-semantic relationships which hold between
the queries that are most strongly related according to their temporal profiles.
For this evaluation, we have taken the nominal phrases appearing in WordNet 3.0. Given that
users, when writing queries, typically do not pay
attention to punctuation and case, we have normalized all phrases by lowercasing them and removing all punctuation. Next, we collected the time series for each phrase by computing the normalized
daily frequency of each of them as exact queries
in the query logs. The computation of the pairwise correlations was performed in parallel using
the MapReduce infrastructure running over 2048
cores with 500 MB of RAM each. The total execution (including data shuffling and networking
time) took approximately three hours.
Next, we represented the data as a complete
graph where phrases are nodes and the edge between each pair of nodes is weighted by their time
series correlation. Using a simple graph-cut we
obtained clusters of related terms. A minimum
weight threshold equal to 0.9 was applied;5 thus,
5
This threshold is the same used by Chien (2005), and was
confirmed after a manual inspection of a sample of the data
two phrases belong to the same cluster if there is
a path between them only via edges with weight
over 0.9.
The previous procedure produced a set of 604
clusters, with highly different sizes. The first observation is that 70% of the phrases in WordNet
do not have a correlation over 0.9 with any other
phrase, so they are placed alone in singleton clusters. There are several reasons for this. The clusters obtained are very specific: only phrases that
have a very strong temporal association have temporal correlations exceeding the threshold. This is
combined with the fact that we are using a very
restricted vocabulary, namely the terms included
in WordNet, which is many orders of magnitude
smaller than the vocabulary of all possible queries
from the users. Few phrase pairs in WordNet
have a temporal association and popularity strong
enough to be clustered together. Finally, many of
the phrases in WordNet are rare, including scientific names of animals and plants, genuses or families, which are not commonly used. Therefore, the
clusters extracted here correspond to very salient
sets of phrases. If, instead of WordNet, we choose
a vocabulary from known user queries (cf. Section 4), there would be many fewer singleton clusters, as the options of similar phrases to choose
from would be much larger.
From the phrases that belong to clusters, 25%
of the WordNet phrases do not have strong daily
temporal profiles. The typical pattern for these
terms is an almost flat time series, usually with
small drops at summertime and Christmas (when
seasonal leisure-related queries dominate). Therefore, these phrases were collected in just one cluster containing them all. Typical examples of the
elements of this set are names of famous scientists
and mathematicians (Gauss, Isaac Newton, Albert Einstein, Thomas Alva Edison, Hipprocrates,
Gregor Mendel, ...), common terms (fertilization,
famine, macroeconomics, genus, nationalism, ...),
numbers and common first names, among other
things. It is possible that using sub-day intervals
might help to discriminate within this cluster.
The items in this big cluster contrast with periodical events, which display recurring patterns
(e.g., queries related to elections or tax-returns),
and names of famous people and other entities
which appeared in the news in the past few years.
All of these are associated with irregular, spiky
time series. These constitute the final 5% of the
Type
Synonyms
Hyponym/hyperonyms
Siblings in hyponym taxonomy
Meronym/holonyms
Siblings in meronymy taxonomy
Other paths
Not structurally related
Pairs
283
86
611
53
7
471
1009
Examples
(angel cake, angel food cake), (thames, river thames), (armistice day, Nov 11)
(howard hughes, aviator), (muhammad, prophet), (olga korbut, gymnast)
(hiroshima, nagasaki), (junior school, primary school), (aids, welt)
(tutsi, rwanda), (july 4, july), (pyongyang, north korea)
(everglades, everglades national park), (mississipi, orleans)
(maundy thursday, maundy money), (tap water, water tap), (gren party, liberal)
(poppy, veterans day), (olympic games, gimnast), (belmont park, horse racing)
Table 1: Relationships between pairs of WordNet phrases belonging to the same cluster.
phrases belonging to small, highly focused, clusters.
Table 1 shows the relationships that hold between all pairs of phrases belonging to any of the
smaller clusters. Out of 2520 pairs, 283 belong
to the same synset, 697 are related via hyponymy
links, 60 via meronymy links, and 471 by alternating hyponymy and meronymy links in the path.
When the phrases were polysemous, the shortest path between any of their meaning was used.
About 40% of the relations do not have a clear
structural interpretation in WordNet.
The majority of pairs are related via more or
less complex paths in the WordNet graph. Interestingly, even the structurally unrelated terms are
characterized by transparent relations in terms of
world knowledge, as it is the case between poppy
and veteran day. Note as well that sometimes a
WordNet term is used with a meaning not present
in WordNet or in a different language, which may
explain why aids has a very high correlation with
welt (AIDS and welt are both hyponyms of health
problem, but the correlation may be explained better by the AIDS World Day, Welt Aids Tag in German), and it also has a very high correlation with
sida, defined in WordNet as a genus of tropical
herbs, but which is in fact the translation of AIDS
into Spanish. These observations motivated an additional manual labelling of the extracted pairs.
3.2 Hand labelled evaluation
As can be seen in Table 2, most of the terms that
constitute a cluster are related to each other, although the kinds of semantic relationships that
hold between them can vary significantly. Examples of the following kinds can be observed:
• True synonyms, as in the case of november
and nov, or architeuthis and giant squid.
• Variations of people names, especially if a
person’s first name or surname is typically
used to refer to that person, as in the case of
john lennon and lennon, or janis joplin and
•
•
•
•
•
•
•
•
•
•
joplin. Sometimes the variations include personal titles, as it is the case of president carter
and president nixon, which are highly correlated with jimmy carter and richard nixon.
Geographically-related terms, referring to
locations which are located close to each
other, as in the clusters {korea, north korean, south korea, pyongyang, north korea}
and {strasbourg, grenoble, toulouse, poitiers,
lyon, lille, nantes, reims}.
Synonyms of location names, like bahrain
and bahrein.
Derived words, like north korea and north
korean, or lebanese and lebanon.
Generic word optionalizations, which happen when one word in a multi-word phrase
is very correlated to the phrase, as in the
case of spanish inquisition and inquisition,
or red bone marrow and red marrow, where
the most common interpretation for the shortened version of the phrase is the same as for
the long version.
Word reordering, where the two related
phrases have the same words in a different order, as in the case of maple sugar and sugar
maple, or oil palm and palm oil.
Morphological variants: WordNet does not
contain many morphological variants in the
main dataset, but there are a few, like station
of the cross and stations of the cross.
Acronyms, like federal emergency management agency and fema.
Hyperonym-hyponym, like fern and plant.
Sibling terms in a taxonomy, as in the cluster {lutheran, methodist, presbyterian, united
methodist church, lutheran church,methodist
church, presbyterian church,baptist, baptist
church}, which contains mostly names of
Christian denominations.
Co-occurring events in time, as is the case
of hitch and pacifier, both titles of movies
which were launched at almost the same
hydrant,fire hydrant
inauguration day,inauguration,swearing,investiture,inaugural address,inaugural,benediction,oath
indulgence,self indulgence
insulation,heating
interstate highway,interstate, intestine,small intestine
iq,iq test
irish people,irish,irish potato,irish gaelic,gaelic,irish soda bread,irish stew,st patrick,saint patrick,leprechaun,
march 17,irish whiskey,shillelagh
ironsides,old ironsides
james,joyce,james joyce
janis joplin,joplin
jesus christ,pilate,pontius pilate,passion of christ,passion,aramaic
jewish new year,rosh hashana,rosh hashanah,shofar
john lennon,lennon
julep,mint julep,kentucky derby,kentucky
keynote,keynote address
kickoff,time off
korea,north korean,south korea,pyongyang,north korea
l ron hubbard,scientology
leap,leap year,leap day,february 29
left brain,right brain
leftover,leftovers,turkey stew
linseed oil,linseed
listeria,listeriosis,maple leaf
lobster tail,lobster,tails
lohan,lindsay
loire,rhone,rhone alpes
looking,looking for
lutheran,methodist,presbyterian,united methodist church,lutheran church,methodist church,presbyterian church,
baptist,baptist church
mahatma gandhi,mahatma
malignant hyperthermia,hyperthermia
maple sugar,sugar maple
martin luther,martin luther king,luther,martin,martin luther king day
matzo,matzah,matzoh,passover,seder,matzo meal,pesach,haggadah,gefilte fish
mestizo,half blood,half and half
meteorology,weather bureau
moslem,muslim,prophet,mohammed,mohammad,muhammad,mahomet
movie star,star,revenge,film star,menace,george lucas
mt st helens,mount saint helens,mount st helens
myeloma,multiple myeloma
ness,loch ness,loch ness monster,loch,nessie
new guinea,papua new guinea,papua
november,nov
pacifier,hitch
papa,pope,vatican,vatican city,karol wojtyla,john paul ii,holy see,pius xii,papacy,paul vi,john xxiii,the holy see,
vatican ii,pontiff,gulp,pater,nostradamus,ii,pontifex
parietal lobe,glioma,malignant tumor
particle accelerator,atom smasher,hadron,large,tallulah bankhead,bankhead,tanner
pledge,allegiance
president carter,jimmy carter
president nixon,richard nixon,richard m nixon
sept 11,september 11,sep 11,twin towers,wtc,ground zero,world trade center
slum,millionaire,pinto
strasbourg,grenoble,toulouse,poitiers,lyon,lille,nantes,reims
valentine,valentine day,february 14,romantic
aeon,flux
alien,predator
anne hathaway,hathaway
architeuthis,giant squid
basal temperature,basal body temperature
execution,saddam hussein,hussein,saddam,hanging,husain
flood,flooding
george herbert walker bush,george walker bush
intifada,palestine
may 1,may day,maypole
Table 2: Sample of clusters obtained from the temporal correlations.
Type
True synonyms
Variations of people names
People names with and without titles
First name and surname from the same person
Geographically-related terms
Synonyms of location names
Derived words
Word optionalizations
Word reordering
Morphological variants
Acronyms
Cross-language synonyms
Hyperonym/hyponym
Sibling terms
Co-ocurring events in time
Topically related
Unrelated
Clusters
19
42
4
4
18
4
4
87
7
1
1
3
10
10
8
38
72
Table 3: Results of the manual annotation of 2item clusters.
time. A particular example of this is when
the two terms are part of a named entity, as in
the case of quantum and solace, which have
a similar correlation because they appear together in a movie title.
• Topically-related terms, as the cluster
{jesus christ, pilate, pontius pilate, passion of
christ, passion, aramaic}, or the cluster containing popes and the Vatican. A similar example, execution is highly correlated to saddam hussein, because his execution attracted
more interest worldwide during this time period than any other execution. Interestingly,
topical correlation emerges at very specific
granularity.
For the manual analysis of the results, we randomly selected 332 clusters containing only two
items (so that 664 phrases were considered in total). Each of these pairs has been classified in one
of the previous categories. The results of this analysis are shown in Table 3.
4
Application to query suggestion
Query suggestion is a feature of search engines
that helps users reformulate queries in order to better describe their information need with the purpose of reducing the time needed to find the desired information (Beeferman and Berger, 2000;
Kraft and Zien, 2004; Sahami and Heilman, 2006;
Cucerzan and White, 2007; Yih and Meek, 2008).
In this section, we explore the application of a similarity metric based on time series correlations for
finding related queries to suggest to the users.
As a test set, we have used the query sugges-
Method
Random
Web Kernel
Dist. simil.
Time series
Combination
P@1
0.37
0.51
0.72
0.74
0.79
P@3
0.37
0.47
0.63
0.63
0.68
P@5
0.37
0.42
0.60
0.53
0.60
mAP
0.43
0.51
0.64
0.67
0.69
Table 4: Results for the query suggestion task.
tion dataset from (Alfonseca et al., 2009). It contains a set of 57 queries and an average of 22 candidate query suggestions for each of them. Each
suggestion was rated by two human raters using
the 5-point Likert scale defined in (Sahami and
Heilman, 2006), from irrelevant to highly relevant.
The task involves providing a ranking of the suggestions that most closely resembles the human
scores. The evaluation is based on standard IR
metrics: precision at 1, 3 and 5, and mean average
precision. In order to compute the precision- and
recall-based metrics, we infer a binary distinction
from the ratings: related or not related. The interannotator agreement for this dataset given the binary classification as computed by Cohen’s Kappa
is 0.6171.
We used three baselines: the average values that
would be produced by a random scorer of the candidate suggestions, Sahami and Heilman (2006)’s
system (based on calculating similarities between
the retrieved snippets), and a recent competitive
ranker based on calculating standard distributional
similarities (Alfonseca et al., 2009) between the
original query and the suggestion. Please refer to
the referenced work for details.
In order to produce the ranked lists of candidate suggestions for each query, due to the lack of
training data, we have opted for the unsupervised
procedure described in the previous section:
1. Collect the daily time series of each of the
queries and the candidate suggestions.
2. Calculate the correlation between the original
query and each of the candidate suggestions
provided for it, and use it as the candidate’s
score.
3. For each query, rank its candidate suggestions in decreasing order of correlation.
Finally, taking into account that the source of
similarity is very different to the one used for distributional similarity, we tested the hypothesis that
7
5
Application to query categorization
The results from the manual evaluation in Section 3.2 support the conclusion that time series
from query logs provide powerful signals for clustering at a fine-grained level, in some cases uncovering synonyms (may 1st, may day) and even
causal relations (insulation, heating). A natural
question is if temporal information is correlated
with other types of categorizations. In this section we carry out a preliminary exploration of the
relation between query time series and query categorization. To this extent we adapt the data from
the KDD 2005 CUP (Li et al., 2005), which provides a set of queries classified into 67 broad topical categories. Since the data is rather sparse (678
queries) we applied Fourier analysis to “smooth”
the time series.
6
STANDARDIZED FREQUENCY
a combination of the two techniques would be beneficial to capture different features of the queries
and suggestions. We have trained a linear mixture
model combining both scores (time series and distributional similarities), using 10-fold cross validation.
The results are displayed in Table 4. For evaluating the results, whenever a system produced a
tie between several suggestions, we generated 100
random orderings of the elements in the tie, and
report the average scores.
Using distributional similarities and the temporal series turned out to be indistinguishable for the
precision scores at 0.95 confidence, and both are
significantly better than the similarity metric based
on the web kernel. The combination produced an
improvement across all metrics, although not statistically significant at p=0.05.
This is quite a positive finding as the time series
method relies on stored information requiring only
simple and highly optimized lookups.
5
4
3
2
1
0
−1
TIME
Figure 2: RDFT reconstruction for the query
“brush cutters” using the first 25 Fourier coefficients. The squares represent the original time
series datapoints, while the continuous line represents the reconstructed signal.
to use the labelled data7 in a simplified manner to
better understand the semantic properties of query
time series. Each query in the dataset is assessed
by three editors who can assign multiple topic labels from a set of 67 categories belonging to seven
broad topics: Computers, Entertainment, Information, Living, Online Community, Shopping and
Sports. We merged the KDD Cup development
and test set, out of the 911 queries we were able to
retrieve significant temporal information for 678
queries. We joined the sets of labels from each assessor for each query. On average, each query is
assigned five labels.
5.2 DFT analysis
The KDD Cup 2005 6 introduced a query categorization task and dataset consisting of 800,000 unlabeled queries for unsupervised training, and an
evaluation set of 911 queries, 111 for development
and 800 for the final evaluation. The systems submitted for this task can be quite complex and made
full use of the large unlabeled set. Our goal here is
not to provide a comparative evaluation, but only
Assessing the similarity of data represented as
time series has been addressed mostly my means
of Fourier analysis; e.g., Agrawal et al. (1993) introduce a method for efficiently retrieving time
series from databases based on Discrete Fourier
Transform (DFT). Several other methods have
been proposed, e.g., Discrete Wavelet Transform (DWT), however DFT provide a competitive
benchmark approach (Wu et al., 2000).
We use DFT to generate the Fourier coefficients
of the time series and Reverse DFT (RDFT) to reconstruct the original signal using only a subset
of the coefficients. This analysis effectively compresses the time series producing a smoother approximate representation. DFT can be computed
efficiently via Fast Fourier Transform (FFT), with
6
http://www.sigkdd.org/kdd2005/kddcup.
html
7
The KDD Cup dataset is probably the only public query
log providing topical categorization information.
5.1 The KDD CUP data
Method
Random
MostFrequent
DFT-c10
DFT-c50
DFT-c100
DFT-c200
DFT-c400
DFT-c600
DFT-c800
DFT-c1000
Accuracy
0.107
0.490
0.425
0.456
0.502
0.456
0.506
0.481
0.478
0.466
± std-err
0.03
0.07
0.06
0.05
0.05
0.04
0.05
0.06
0.04
0.05
Table 5: Results of the KDD dataset exploration.
complexity O(n log n) where n is the length of the
sequence. The approximate representation is useful not only to address sparsity but can also be used
to efficiently estimate the similarity of two time
series using only a small subset of coefficients as
in (Agrawal et al., 1993). As an example, Figure 2 shows the original time series for the query
“brush cutters” and its reconstructed signal using
only the first 25 Fourier coefficients. The reconstructed signal captures the essence of the periodicity of the query and highlights the yearly peaks
registered for the query in spring and summer.
5.3 Experiment and discussion
To explore the correlation between the structured
temporal representation of queries provided by the
time series and topical categorization we run the
following experiment. Each KDD Cup query was
reconstructed via RDFT using a variable number
of coefficients. The set of 679 queries was partitioned in 10 sets and a 10-fold evaluation was performed. For each fold we trained a classifier on the
remaining 9 folds. We used an average multi-class
perceptron (Freund and Schapire, 1999) adapted to
multi-label learning (Crammer and Singer, 2003).
Each model was trained on a fixed number of 10
iterations. The accuracy of each model was evaluated as the fraction of test items for which the
selected highest scoring class was in the gold standard set provided by the editors. As a lower bound
we estimated the accuracy of randomly choosing
a label for each test instance, and as a baseline we
used the most frequent label. The latter is a powerful predictor: baselines based on class frequency
outperform most of the systems that participated in
the KDD Cup (Lin and Wu, 2009).
Table 5 reports the average accuracy over the
10 runs with relative standard errors. Each DFTbased model is characterized by the number of coefficients used for the reconstruction. Two main
patterns are noticeable. First, none of the differences between the frequency-based baseline and
the DFT models is significant, this seems to indicate that temporal structure alone is not a good discriminator of topic, at least of broad categories. In
retrospect, this is somewhat predictable. The temporal dimension is a basic semantic component of
lexical meaning and world knowledge which is not
necessarily associated with any broad, and to some
extent subjective, categorization. An inspection of
the patterns found in each category shows in fact
that similar patterns often emerge in different categories; e.g., “Halloween costume” and “cheesecake recipe” have a similar yearly periodical pattern with spikes in early winter, while monotonically decaying patterns are shared across all categories; e.g., between computer hardware and kids
toys.
The second interesting finding is the trend of
the DFT system results, higher at low-intermediate
values, providing some initial promising evidence
that DFT analysis generates useful compressed
representations which could be indexed and applied efficiently. Notice that the sequences reconstructed using 1,000 coefficients reproduce almost
identically the original signals.
6
Applications in information extraction
Time series from query logs are particularly relevant for phrases that refer to entities which are
involved in recent events. Therefore, we expect
them to be useful for solving other applications
that require handling entities, such as named entity recognition and classification, relation extraction or disambiguation.
To illustrate this point, we mention an example
of relation extraction between actors and movies:
movies usually have spikes when they are released, and then the frequency again drops sharply.
At the same times, when a movie is released, the
search engine users have a renewed interest in
their actors. Figure 3 displays the time series for
the five most recent movies by Jim Carrey (as of
march 2009), and the time series for Jim Carrey.
As can be seen, the spikes are at exactly the same
points in time. If we add up the series (a) through
(e) into a single series and calculate the correlation
with (f), it turns out to be very high (0.88).
(a)
(b)
(c)
(d)
(e)
(f)
Figure 3: Time series obtained for the five most recent movies with Jim Carrey, and (f) time serie for the
query [jim carrey] (normalized scales).
System
Random
Time series
Precision
0.24
0.53
Recall
0.14
0.66
F-measure
0.17
0.57
Table 6: Results for the query suggestion task.
To validate the hypothesis that this data should
be useful for identifying related entities, we have
performed a small experiment in the following
way: by choosing five popular actors8 and the cinema movies in which they appear since the year
2004, obtained from IMDB9 . Using the time series, for each actor we choose the combination of
movies such that, by adding up the time series of
those movies, we maximise the correlation with
the actor’s time series. It has been implemented
with a greedy beam search, with a beam size of
100. The results are shown in Table 6. The random
baseline randomly associates the movies from the
dataset with the five actors.
We do not believe this to be a perfect feature as,
for example, actors may have a peak in the time series related to their personal lives, not necessarily
to movies. However, the high correlations that can
be obtained when the pairing between actors and
movies is correct, and the improvement with respect a random baseline, indicates this is a feature
which can probably be integrated with other relation extraction systems when handling relationships between entities that have big temporal dependencies.
8
Ben Stiller, Edward Norton, Jim Carrey, Leonardo Dicaprio, and Tom Hanks.
9
www.imdb.com.
7
Conclusions and future work
This paper explores the relationships between
queries whose associated time series obtained
from query logs are highly correlated. The use
of time series in semantic similarity has been discussed by Chien (2005), but only a very preliminary evaluation was described, and, to our knowledge, they had never been applied and evaluated
in solving existing problems. Our results indicate
that, for a substantial percentage of phrases in a
thesaurus, it is possible to find other highly-related
phrases; and we have categorized the kind of semantic relationships that hold between them.
We have found that in a query suggestion
task, somewhat surprisingly, results are comparable with other state-of-the-art techniques based on
distributional similarities. Furthermore, information obtained from time series seems to be complementary with them, as a simple combination of
similarity metrics produces an important increase
in performance..
From an analysis on a query categorization task
the initial evidence suggests that there is no strong
correlation between broad topics and temporal
profiles. This agrees with the intuition that time
provides a fundamental semantic dimension possibly orthogonal to broad topical classification. This
issue however deserves further investigation. Another issue which is worth a deeper investigation
is the application of Fourier transform methods
which offer tools for studying the periodic structure of the temporal sequences.
References
R. Agrawal, C. Faloutsos, and A.N. Swami. 1993. Efficient similarity search in sequence databases. In
Proceedings of the 4th International Conference on
Foundations of Data Organization and Algorithms,
pages 69–84.
E. Alfonseca, K. Hall, and S. Hartmann. 2009. Largescale computation of distributional similarities for
queries. In Proceedings of North American Chapter of the Association for Computational Linguistics
- Human Language Technologies conference.
N. Bansal and N. Koudas. 2007a. BlogScope: a system for online analysis of high volume text streams.
In Proceedings of the 33rd international conference
on Very large data bases, pages 1410–1413.
R. Kraft and J. Zien. 2004. Mining anchor text for
query refinement. In Proceedings of the 13th international conference on World Wide Web, pages 666–
674.
Y. Li, Z. Zheng, and H. Dai. 2005. KDD Cup-2005
report: Facing a grat challenge. SIGKDD Explor.
Newsl., 7(2):91–99.
D. Lin and X. Wu. 2009. Phrase clustering for discriminative learning. In Proceedings of the Annual
Meeting of the Association for Computational Linguistics and the International Joint Conference on
Natural Language Processing of the Asian Federation of Natural Language Processing.
O. Medelyan, C. Legg, D. Milne, and I.H. Witten.
2008. Mining meaning from Wikipedia. Dept. of
Computer Science, University of Waikato.
N. Bansal and N. Koudas. 2007b. BlogScope: Spatiotemporal analysis of the blogosphere. In Proceedings of the 16th international conference on World
Wide Web, pages 1269–1270.
Q. Mei, D. Zhou, and K. Church. 2008. Query suggestion using hitting time. In Proceeding of the
17th ACM conference on Information and knowledge management, pages 469–478.
D. Beeferman and A. Berger. 2000. Agglomerative
clustering of a search engine query log. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining,
pages 407–416.
G.A. Miller. 1995. WordNet: a lexical database for
English. Communications of the ACM, 38(11):39–
41.
S. Chien. 2005. Semantic similarity between search
engine queries using temporal correlation. In Proceedings of the 14th international conference on
World Wide Web, pages 2–11.
K. Crammer and Y. Singer. 2003. Ultraconservative
online algorithms for multiclass problems. Journal
of Machine Learning Research, 3:951–991.
S. Cucerzan and R.W. White. 2007. Query suggestion based on user landing pages. In Proceedings
of the 30th annual international ACM SIGIR conference on Research and development in information
retrieval, pages 875–876.
Y. Freund and R.E. Schapire. 1999. Large margin classification using the perceptron algorithm. Machine
Learning, 37:277–296.
J. Ginsberg, M.H. Mohebbi, R.S. Patel, L. Brammer,
M.S. Smolinski, and L. Brilliant. 2009. Detecting
influenza epidemics using search engine query data.
Nature, 457, February.
R. Jones and F. Diaz. 2007. Temporal profiles of
queries. ACM Transactions on Information Systems,
25(3):14.
R. Jones, B. Rey, O. Madani, and W. Greiner. 2006.
Generating query substitutions. In Proceedings of
the 15th international conference on World Wide
Web, pages 387–396.
J. Kleinberg. 2006. Temporal dynamics of on-line information streams. In Data Stream Management:
Processing High-Speed Data. Springer.
T. Murata. 2008. Detection of breaking news from
online web search queries. New Generation Computing, 26(1):63–73.
M. Sahami and T.D. Heilman. 2006. A web-based kernel function for measuring the similarity of short text
snippets. In Proceedings of the 15th international
conference on World Wide Web, pages 377–386.
E. Terra and C.L.A. Clarke. 2004. Scoring missing
terms in information retrieval tasks. In Proceedings
of the thirteenth ACM international conference on
Information and knowledge management, pages 50–
58.
D.S. Weld, F. Wu, E. Adar, S. Amershi, J. Fogarty,
R. Hoffmann, K. Patel, and M. Skinner. 2008. Intelligence in Wikipedia. In Proceedings of the 23rd
Conference on Artificial Intelligence.
Y. Wu, D. Agrawal, and A. El Abbadi. 2000. A comparison of DFT and DWT based similarity search
in time-series databases. In Proceedings of the 9th
International ACM Conference on Information and
Knowledge Management, pages 488–495.
W. Yih and C. Meek. 2008. Consistent Phrase Relevance Measures. Workshop on Data Mining and
Audience Intelligence for Advertising, page 37.
T. Zesch, C. Muller, and I. Gurevych. 2008a. Extracting lexical semantic knowledge from Wikipedia and
Wiktionary. In Proceedings of the Conference on
Language Resources and Evaluation.
T. Zesch, C. Muller, and I. Gurevych. 2008b. Using
Wiktionary for computing semantic relatedness. In
Proceedings of the Conference on Artificial Intelligence, pages 861–867.