[go: up one dir, main page]

0% found this document useful (0 votes)
27 views18 pages

02 - Lect2 Search Engines - Part1

The document outlines the requirements and components of search engine design, focusing on effectiveness and efficiency. It details the indexing and query processes, including user interaction, ranking, and evaluation components that enhance search results. Key aspects include the importance of logging user behavior for improving search performance and the role of ranking in determining the relevance of documents.

Uploaded by

mh861590
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views18 pages

02 - Lect2 Search Engines - Part1

The document outlines the requirements and components of search engine design, focusing on effectiveness and efficiency. It details the indexing and query processes, including user interaction, ranking, and evaluation components that enhance search results. Key aspects include the importance of logging user behavior for improving search performance and the role of ranking in determining the relevance of documents.

Uploaded by

mh861590
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Information Storage

and Retrieval
CS418

Search Engine Architecture


Lecture 2
Dr. Ebtsam AbdelHakam
ebtsamabd@gmail.com
Computer Science Dept.
Minia University
Requirements of Designing a
Search Engine

The two primary requirements of a search engine are:

• Effectiveness (quality): We want to be able to retrieve


the most relevant set of documents possible for a query.

• Efficiency (speed): We want to process queries from


users as quickly as possible.
Designing a Search Engine
Search engine design balances two factors:

‣ Effectiveness – accuracy of results, presentation


of results, absence of spam, good ad selection

‣ Efficiency / Performance – response time,


concurrency, disaster mitigation, security
issues.

These factors deeply impact the architecture of these


systems. Often the engineering solutions feed back
into research (NoSQL, Map Reduce, etc.).
Search Engine Basic Building Blocks

Search engine components support two major functions,


which we are called:
.
1- the indexing process: The indexing process builds the
structures that enable searching.
The index (inverted index) is an efficient data structure that
represents the documents of a Corpus and allows fast searching of
the Corpus documents using that indexed information.

2- the query process: the query process uses those


structures (index) and a person’s query to produce a
ranked list of documents
Query process
1. User interaction
It supports creation and refinement of user query and
displays the results.

2. Ranking
It uses query and indexes to create ranked list of
documents.

3. Evaluation
It monitors and measures the effectiveness and
efficiency. It is done offline
Query Process
(User Interaction)
The• user interaction component provides the interface between
the person doing the searching and the search engine.

Its three tasks are:


1- Accepting the user’s query, query language is defined and
transforming it into index terms.

- Query Transformation: The user-interface parses user queries, and


converts search terms in a form that is acceptable for input to the query
engine i.e. into index terms that appear in the index vocabulary.
- Spell checking and query suggestion suggest improvements to the
user, or run alternative queries in the background
User Interaction
Query suggestion (a prank)

User Interaction Component

Its •three tasks are:

2- Take the ranked list of documents from the search engine and
organize it into the results shown to the user.

‣ Displays the top-ranked results

‣ Generates snippets to show how queries match documents

‣ Highlights important words and passages

‣ Retrieves query-relevant advertising.


User Interaction Component


3- Finally, this component also provides a range of techniques for
refining the query so that it better represents the information
need.

‣ Query expansion adds terms related to the query terms (e.g.


synonyms, related entities)

‣ Relevance feedback runs an initial query, then uses the top-ranked


documents to expand the query for a second run
Query Process
(Ranking)
Ranking Component
 The ranking component is the core of the search engine.
• It takes the transformed query from the user interaction component
and generates a ranked list of documents using scores based on a
retrieval model.

• Ranking must be both efficient, since many queries may need to be


processed in a short time, and effective, since the quality of the
ranking determines whether the search engine accomplishes the
goal of finding relevant information.

 The efficiency of ranking depends on the indexes,

 The effectiveness depends on the retrieval model.


Ranking
Document scoring

‣ A score is assigned to the most likely-relevant documents based


on how well it matches the query.

‣ Core component of a search engine, and often the most


closely-guarded secret.

‣ Many, many approaches and variations have been


developed

‣ The basic form is the dot product of query term weights and
corresponding document weights:
Query Process
(Evaluation)
Evaluation component
 The task of the evaluation component is to measure and monitor
effectiveness and efficiency.

• An important part of that is to record and analyze user behavior using


log data.

 The results of evaluation are used to tune and improve the ranking
component.

• Most of the evaluation component is not part of the online search


engine, apart from logging user and system data.

 Evaluation is primarily an offline activity, but it is a critical part of any


search application.
Evaluation component
• Logging

‣ Logging user interaction is an essential tool for


measuring performance

‣ Query logs and clickthrough data are used for query


suggestion, spell checking, query caching, ranking,
advertising search, …

• Logging. Query logs of the users’ interactions with the search


engine are obtained and are of paramount importance.

• They can improve the search experience, speed up results, store


results of common queries, and identify source of new revenue.
Evaluation component

Pages that are clicked or ignored might be logged to improve the overall
quality of the search engine but also detect patterns in user activity (i.e.
data-mining).

Query logs can be used for a variety of other reasons that include:
1. Keeping track of a history of user queries,
2. Generation of spell checking logs (instead of running the
spellchecker every time)
3. Recording of time spent on the query or a particular document
4. Query logs and clickt-hrough data are used for query suggestion,
spell checking, query caching, ranking, advertising search.

You might also like