[go: up one dir, main page]

0% found this document useful (0 votes)
19 views51 pages

Lecture 05

The lecture discusses enhancements to scoring and evaluation in information retrieval systems, focusing on efficient cosine ranking and methods to select the top K documents without computing all cosine values. It introduces various pruning techniques and scoring methods, including champion lists and static quality scores, to improve retrieval efficiency. The evaluation of IR systems is emphasized, highlighting the importance of user happiness and the measurement of relevance through precision-recall metrics.

Uploaded by

samplesaji1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views51 pages

Lecture 05

The lecture discusses enhancements to scoring and evaluation in information retrieval systems, focusing on efficient cosine ranking and methods to select the top K documents without computing all cosine values. It introduces various pruning techniques and scoring methods, including champion lists and static quality scores, to improve retrieval efficiency. The evaluation of IR systems is emphasized, highlighting the importance of user happiness and the measurement of relevance through precision-recall metrics.

Uploaded by

samplesaji1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Lecture 5:

Further enhancement to Scoringandresults


assembly and evaluation
Today’s Lecture
• Speeding up vector spaceranking
• Putting together acompletesearch system
• Will require learning about anumber of miscelaneous topics and
heuristics

• Evaluation of the IR System

.
Efficient CosineRanking
• Find the Kdocs in the collection“nearest” tothe query Klargest
query-doccosines.
• Efficientranking:
– Computingasingle cosineefficiently.
– Choosingthe Klargestcosinevaluesefficiently.

• Canwe do this without computing allN cosines?


Computing CosineScores
Computing the K largest Cosines:
Selection vs.Sorting
• Typicaly we want to retrieve the topK docs (in the cosine
ranking for thequery)
– not to totally order all docs in the collection
• Canwe pick off docswith K highestcosines?
• Let J= number of docs with non zero cosines
– Weseekthe Kbest of thesefromJ
Sec.7.1

UseHeap for SelectingTop K


• Binary tree in which each node’s value > the values of children
(max heap)
• Takes 2J operations to construct,then each of
K “winners” read off in 2log J steps.
• For J=1M, K=100, this is about 10% of the cost of sorting
1

.9 .3

.3 .8 .1

.1 13
Bottlenecks
• Primarycomputational bottleneck inscoring: cosinecomputation

• Canwe avoid all this computation?

• Yes,but maysometimes get it wrong


– adoc not in the top Kmaycreepinto the list ofK output docs
– Is this suchabad thing?
CosineSimilarity is only aProxy
• User has a task and a query formulation

• Cosine matches docs to query

• Thus cosine is anyway a proxy for user happiness

• If we get a list of K docs “close” to the top K by cosine measure,


should be ok
GenericApproach
• Find a set A of contenders,with K < |A|<< N
– A does not necessarily contain the top K,but has many docs
from among the top K
– Return the top K docs in A
• Think of A as pruning non-contenders
• The same approach is also used for other (non-
cosine) scoring functions
• Will look at several schemes following this approach
Pruning N on-Contenders
• Index elimination
• Champion lists
• Static quality scores
• Impact ordering
• Cluster pruning
Index Elimination
• Basic cosine computation algorithm only considers docs
containing at least one query term

• Take this further:


– Only consider high-idf query terms
– Only consider docs containing many query terms
High-idf Q ueryTerms O nly
• For aquery such as catcher in therye
• O nly accumulate scores from catcher and rye
• Intuition:in andthe contribute little to the scores andso don’t
alter rank-ordering much
• Benefit:
– Postingsoflow-idftermshavemanydocs. these(many)docsget
eliminatedfrom setAof contenders
Docs Containing Many Q ueryTerms
• Anydoc with at least one query termis a candidate for
the top Koutput list
• For multi-term queries,only compute scores for docs
containingseveralof the queryterms
– Say,at least 3 out of 4
– Imposes a“soft conjunction” on queries seenon web search engines
(earlyGoogle)
• Easyto implement in postings traversal
Sec.7.1.2

3 of 4 Q ueryTerms
Antony 3 4 8 16 32 64 128

Brutus 2 4 8 16 32 64 128

Caesar 1 2 3 5 8 13 21 34

Calpurnia 13 16 32

Scoresonly computed for docs 8,16 and32.

16
Champion lists
• Precompute for each dictionary term t,the r

docs of highest weight in t’s postings


– Call this the champion list for t
– (aka fancy list or top docs for t)
• At query time,only compute scores for docs in the champion list of some query
term
– Pick the K top-scoring docs from amongst these
• Note that r has to be chosen at index build time
– Thus, it’s possible that r < K
Static quality scores
• We want top-ranking documents to be both relevant and authoritative
• Relevance is being modeled by cosine scores
• Authority is typically a query-independent property of a document
• Examples of authority signals
 Wikipedia among websites
 Articles in certain newspapers
 A research paper with many citations
 Many bitly’s,diggs marks
 (Pagerank)
Modeling authority
• Assign to each document a query-independent quality score in [0,1]
to each document d
– Denote this by g(d)
• Thus,a quantity like the number of citations is scaled into [0,1]
N et score
• Considerasimpletotal scorecombining cosinerelevancea
authority
• net-score(q,d) = g(d) + cosine(q,d)
– Canusesomeother linear combination
– Indeed,anyfunction of the two“signals” of user happiness– more
later
• Now we seekthe top Kdocs by netscore
TopK by net score – fastmethods
• First idea:O rder all postings byg(d)
• Key:this is acommon ordering for allpostings
• Thus,canconcurrently traverse queryterms’ postings for
– Postings intersection
– Cosinescore computation
W hy order postings byg(d)?
• Under g(d)-ordering, top-scoring docs likely to appear early in
postings traversal
• In time-bound applications (say, we have to return whatever
search results we can in 50 ms), this allows us to stop postings
traversal early
– Short of computing scores for all docs in postings
Champion lists in g(d)-ordering
• Can combine champion lists with g(d)- ordering
• Maintain for each term a champion list of the
r docs with highest g(d) + tf-idftd
• Seek top-K results from only the docs in these champion lists
High and low lists
• For each term,we maintain two postings lists called high and low
– Think of high as the champion list
• When traversing postings on a query,only traverse high lists first
– If we get more than K docs, select the top K and stop
– Else proceed to get docs from the low lists
• Can be used even for simple cosine scores, without global quality
g(d)
• A means for segmenting index into two tiers
• Document-at-a-time scoring
Cluster pruning
Preprocessing:
• Pick 𝑁 docs at random:call these leaders
• For every other doc,pre-compute nearest leader
– Docs attached to a leader:its followers;
– Likely:each leader has ~ 𝑁 followers.
Cluster pruning
Query Processing:
• Process a query as follows:
– Given query Q,find its nearest leader L.
– Seek K nearest docs from among L’s followers.
Visualization
Parametric and zoneindexes
• Thus far,a doc has been a sequence of terms
• In fact documents have multiple parts,some with special semantics:
– Author
– Title
– Date of publication
– Language
– Format
– etc.
• These constitute the metadata about a document
Fields
• We sometimes wish to search by these metadata
– E.g.,find docs authored byWilliam Shakespeare in the year
1601,containing alas poorYorick
• Year = 1601 is an example of a field
• Also,author last name = shakespeare,etc.
• Field or parametric index: postings for each field value
– Sometimes build range trees (e.g.,for dates)
• Field query typically treated as conjunction
– (doc must be authored by shakespeare)
Zone
• A zone is a region of the doc that can contain an arbitrary
amount of text,e.g.,
– Title
– Abstract
– References …
• Build inverted indexes on zones as well to permit querying
• E.g.,“find docs with merchant in the title zone and matching the
query gentle rain”
Examplezone indexes

Encode zones in dictionary vs. postings.


Tiered indexes
• Break postings up into a hierarchy of lists
– Most important
–…
– Least important
• Can be done by g(d) or another measure
• Inverted index thus broken up into tiers of decreasing
importance
• At query time use top tier unless it fails to yield K docs
– If so drop to lower tiers
Exampletiered index
Q uery term proximity
• Free text queries:just a set of terms typed into the query box –
common on the web
• Users prefer docs in which query terms occur within close
proximity of each other
• Let w be the smallest window in a doc containing all query terms,
e.g.,
• For the query strained mercy the smallest window in the
doc The quality of mercy is not strained is ? (words)
• Would like scoring function to take this into account – how?
Impact-ordered postings
• We only want to compute scores for docs for which wft,d
(weighted term-frequency) is high enough
• Term-at-a-time scoring
• We sort each postings list by wft,d
• Now: not all postings in a common order!
• How do we compute scores in order to pick off top K?
– Two ideas follow
1.Earlytermination
• When traversing t’s postings,stop early after either
– a fixed number of r docs
– wft,d drops below some threshold
• Take the union of the resulting sets of docs
– One from the postings of each query term
• Compute only the scores for docs in this union
2.idf-ordered terms
• When considering the postings of query terms
• Look at them in order of decreasing idf
– High idf terms likely to contribute most to score
• As we update score contribution from each query term
– Stop if doc scores relatively unchanged
• Can apply to cosine or some other net scores
Query parsers
• Free text query from user may in fact spawn one or more queries to
the indexes,e.g.,query rising interest rates
– Run the query as a phrase query
– If <K docs contain the phrase rising interest rates,run the two phrase
queries
– If we still have <K docs, run the vector space query
rising interest rates
– Rank matching docs by vector space scoring
• This sequence is issued by a query parser
Aggregate scores
• We’ve seen that score functions can combine cosine,static quality,
proximity,etc.
• How do we know the best combination?
• Some applications – expert-tuned
• Increasingly common: machine-learned
Putting it alltogether
Evaluation of Information Retrieval Systems
• How do we evaluate??
• Try a few queries on his engine and say “Not bad”?
• Is it enough??
• How fast does it index?
• Number of documents/hour
• Incremental indexing
• How fast does it search?
• Latency and CPU needs
• Does it recommend related products/links?
Measure for a search engine
• All of the preceding criteria are measurable
◦ We can make expressiveness precise

• The key measurement:User happiness


◦ What is this?
◦ Speed of response/size of index are factors
◦ But blindingly fast, useless answers won’t make a user happy
◦ Need a way of quantifying user happiness
Measuring user happiness
• Issue :who is the user we are tying to make happy?
◦ Depends on setting

• Web Engine
oUser finds what he/she wants and returns to the engine

• eCommerce site
Happiness:Elusive to measure
• Most common proxy: relevance of search results
• But how do you measure relevance?
 Three elements:
1. A benchmark document collection
• CRAN FIELD,REUTERS ,TREC
2. A benchmark suite of queries
3. An assessment of either Relevant or Nonrelevant for each query and
each document
• Relevance is assessed relative to an information need, not a query
Evaluating an IR system
• Note :User need is translated into a query
• Relevance is assessed relative to the user need,not the query
•E.g,Information need:My swimming pool bottom is becoming black
and needs to be cleaned
• Query:Pool cleaner
•Assess whether the doc addresses the underlying need,not whether
it has these words
Binary Assessments

F Measure: A single measure that trades off precision versus recall


Precision-recall C urve
1.0

0.8
Precision
0.6

0.4

0.2

0.0
0.0 0.2 0.4 0.6 0.8 1.0
Recall
Rank Based Measures
Binary relevance

◦ Precision@K(P@K)
◦ MeanAverage Precision (MAP)
Precision@K(P@K)
• Set a rank threshold K
• Compute % relevant in top K
• Ignores documents ranked lower than K
Mean Average Precision
•Consider rank position of each relevant doc (K1, K2, ……,Kr)
• Compute Precision@ K for each K1, K2, … … ,Kr
• Average Precision = average of P@K
• MAP is average Precision across multiple queries/rankings
Average Precision
Mean Average Precision
Mean Average Precision

You might also like