0% found this document useful (0 votes)

27 views18 pages

02 - Lect2 Search Engines - Part1

The document outlines the requirements and components of search engine design, focusing on effectiveness and efficiency. It details the indexing and query processes, including user interaction, ranking, and evaluation components that enhance search results. Key aspects include the importance of logging user behavior for improving search performance and the role of ranking in determining the relevance of documents.

Uploaded by

mh861590

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views18 pages

02 - Lect2 Search Engines - Part1

Uploaded by

mh861590

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Information Storage

and Retrieval
CS418

Search Engine Architecture

Lecture 2
Dr. Ebtsam AbdelHakam
ebtsamabd@gmail.com
Computer Science Dept.
Minia University
Requirements of Designing a
Search Engine

The two primary requirements of a search engine are:

• Effectiveness (quality): We want to be able to retrieve

the most relevant set of documents possible for a query.

• Efficiency (speed): We want to process queries from

users as quickly as possible.
Designing a Search Engine
Search engine design balances two factors:

‣ Effectiveness – accuracy of results, presentation

of results, absence of spam, good ad selection

‣ Efficiency / Performance – response time,

concurrency, disaster mitigation, security
issues.

These factors deeply impact the architecture of these

systems. Often the engineering solutions feed back
into research (NoSQL, Map Reduce, etc.).
Search Engine Basic Building Blocks

Search engine components support two major functions,

which we are called:
.
1- the indexing process: The indexing process builds the
structures that enable searching.
The index (inverted index) is an efficient data structure that
represents the documents of a Corpus and allows fast searching of
the Corpus documents using that indexed information.

2- the query process: the query process uses those

structures (index) and a person’s query to produce a
ranked list of documents
Query process
1. User interaction
It supports creation and refinement of user query and
displays the results.

2. Ranking
It uses query and indexes to create ranked list of
documents.

3. Evaluation
It monitors and measures the effectiveness and
efficiency. It is done offline
Query Process
(User Interaction)
The• user interaction component provides the interface between
the person doing the searching and the search engine.

Its three tasks are:

1- Accepting the user’s query, query language is deﬁned and
transforming it into index terms.

- Query Transformation: The user-interface parses user queries, and

converts search terms in a form that is acceptable for input to the query
engine i.e. into index terms that appear in the index vocabulary.
- Spell checking and query suggestion suggest improvements to the
user, or run alternative queries in the background
User Interaction
Query suggestion (a prank)
•
User Interaction Component

Its •three tasks are:

2- Take the ranked list of documents from the search engine and
organize it into the results shown to the user.

‣ Displays the top-ranked results

‣ Generates snippets to show how queries match documents

‣ Highlights important words and passages

‣ Retrieves query-relevant advertising.

User Interaction Component

•
3- Finally, this component also provides a range of techniques for
refining the query so that it better represents the information
need.

‣ Query expansion adds terms related to the query terms (e.g.

synonyms, related entities)

‣ Relevance feedback runs an initial query, then uses the top-ranked

documents to expand the query for a second run
Query Process
(Ranking)
Ranking Component
 The ranking component is the core of the search engine.
• It takes the transformed query from the user interaction component
and generates a ranked list of documents using scores based on a
retrieval model.

• Ranking must be both efficient, since many queries may need to be

processed in a short time, and effective, since the quality of the
ranking determines whether the search engine accomplishes the
goal of finding relevant information.

 The efficiency of ranking depends on the indexes,

 The effectiveness depends on the retrieval model.

Ranking
Document scoring
•

‣ A score is assigned to the most likely-relevant documents based

on how well it matches the query.

‣ Core component of a search engine, and often the most

closely-guarded secret.

‣ Many, many approaches and variations have been

developed

‣ The basic form is the dot product of query term weights and
corresponding document weights:
Query Process
(Evaluation)
Evaluation component
 The task of the evaluation component is to measure and monitor
effectiveness and efficiency.

• An important part of that is to record and analyze user behavior using

log data.

 The results of evaluation are used to tune and improve the ranking
component.

• Most of the evaluation component is not part of the online search

engine, apart from logging user and system data.

 Evaluation is primarily an offline activity, but it is a critical part of any

search application.
Evaluation component
• Logging

‣ Logging user interaction is an essential tool for

measuring performance

‣ Query logs and clickthrough data are used for query

suggestion, spell checking, query caching, ranking,
advertising search, …

• Logging. Query logs of the users’ interactions with the search

engine are obtained and are of paramount importance.

• They can improve the search experience, speed up results, store

results of common queries, and identify source of new revenue.
Evaluation component

Pages that are clicked or ignored might be logged to improve the overall
quality of the search engine but also detect patterns in user activity (i.e.
data-mining).

Query logs can be used for a variety of other reasons that include:
1. Keeping track of a history of user queries,
2. Generation of spell checking logs (instead of running the
spellchecker every time)
3. Recording of time spent on the query or a particular document
4. Query logs and clickt-hrough data are used for query suggestion,
spell checking, query caching, ranking, advertising search.

02 - Lect2 Biomedical IR
No ratings yet
02 - Lect2 Biomedical IR
20 pages
Search Engine Architecture
No ratings yet
Search Engine Architecture
15 pages
Information Retrieval
No ratings yet
Information Retrieval
142 pages
Chap 2
No ratings yet
Chap 2
29 pages
Search Engine Architecture Guide
No ratings yet
Search Engine Architecture Guide
23 pages
Search Tools and Their Components
No ratings yet
Search Tools and Their Components
7 pages
Unit-V
No ratings yet
Unit-V
54 pages
Assignment 3 DM
No ratings yet
Assignment 3 DM
12 pages
Documentation Ir
No ratings yet
Documentation Ir
58 pages
Chap 1
No ratings yet
Chap 1
22 pages
Unit - 1
No ratings yet
Unit - 1
51 pages
IR Workbook Answers
No ratings yet
IR Workbook Answers
36 pages
IR Lec1
No ratings yet
IR Lec1
26 pages
Chapter 2
No ratings yet
Chapter 2
45 pages
Text
No ratings yet
Text
5 pages
Building Fast Search Engines
No ratings yet
Building Fast Search Engines
21 pages
Chap - Week8 - Queries and Information Needs
No ratings yet
Chap - Week8 - Queries and Information Needs
44 pages
Unit5 Irt
No ratings yet
Unit5 Irt
10 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
Information
No ratings yet
Information
61 pages
CompletedUNIT 1 PPT 10.7.17
100% (6)
CompletedUNIT 1 PPT 10.7.17
87 pages
Unit 5
No ratings yet
Unit 5
36 pages
Cmpsci 446 Search Engines
No ratings yet
Cmpsci 446 Search Engines
32 pages
Web Technology Search Engines
No ratings yet
Web Technology Search Engines
17 pages
UNIT 3 Notes
No ratings yet
UNIT 3 Notes
32 pages
Lect 1 IRIntroduction
No ratings yet
Lect 1 IRIntroduction
59 pages
VV - IR - UNIT-I - Part2
No ratings yet
VV - IR - UNIT-I - Part2
35 pages
Module 1 - Search Engine Basics
No ratings yet
Module 1 - Search Engine Basics
79 pages
2 Mod-1 - Lec-2
No ratings yet
2 Mod-1 - Lec-2
58 pages
Bulu
No ratings yet
Bulu
47 pages
Unit 5 - Data Science & Big Data - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Data Science & Big Data - WWW - Rgpvnotes.in
17 pages
Search Engine: Amit Kamath Ancy Alphonso
No ratings yet
Search Engine: Amit Kamath Ancy Alphonso
22 pages
Comsats Institute of Information TECHNOLOGY Islamabad
No ratings yet
Comsats Institute of Information TECHNOLOGY Islamabad
11 pages
Search Engine Student Documents
No ratings yet
Search Engine Student Documents
6 pages
Lecture1 Chap1
No ratings yet
Lecture1 Chap1
22 pages
Computer - Search Engines
No ratings yet
Computer - Search Engines
10 pages
Unit 8
No ratings yet
Unit 8
32 pages
Chapter - 6 Part 1
No ratings yet
Chapter - 6 Part 1
21 pages
Topic 2 W2 - SDR - Edited - March2023
No ratings yet
Topic 2 W2 - SDR - Edited - March2023
25 pages
Search Engine Basics for Beginners
No ratings yet
Search Engine Basics for Beginners
29 pages
Information Retrivals Ans
No ratings yet
Information Retrivals Ans
78 pages
Information Retrieval & XML Data
No ratings yet
Information Retrieval & XML Data
37 pages
Module 2-1
No ratings yet
Module 2-1
6 pages
Search ENgine
No ratings yet
Search ENgine
28 pages
Modern Information Retrieval: Computer Engineering Department Fall 2005
No ratings yet
Modern Information Retrieval: Computer Engineering Department Fall 2005
19 pages
Unit 3
No ratings yet
Unit 3
27 pages
Unit 8 - Search Engines
No ratings yet
Unit 8 - Search Engines
8 pages
IR Unit V Notes Remaining
No ratings yet
IR Unit V Notes Remaining
10 pages
Web Search Engine
No ratings yet
Web Search Engine
11 pages
Web Search Engines Explained
No ratings yet
Web Search Engines Explained
4 pages
As3 DM
No ratings yet
As3 DM
9 pages
Module 1print
No ratings yet
Module 1print
5 pages
Search Engine Architecture 1
No ratings yet
Search Engine Architecture 1
23 pages
Working of Search Engine
No ratings yet
Working of Search Engine
11 pages
Web Search Engines: Practice and Experience: Content Analysis Query Prcessing Search Log
No ratings yet
Web Search Engines: Practice and Experience: Content Analysis Query Prcessing Search Log
21 pages
Chap 1
No ratings yet
Chap 1
23 pages
Lecture5 6
No ratings yet
Lecture5 6
30 pages
L01
No ratings yet
L01
33 pages
Take Your Pediatrician With You Keeping Your Child Healthy at Home and On The Road Fast Download
100% (14)
Take Your Pediatrician With You Keeping Your Child Healthy at Home and On The Road Fast Download
16 pages
Abstract Wps Office
No ratings yet
Abstract Wps Office
13 pages
Seven Types of Curriculum
No ratings yet
Seven Types of Curriculum
55 pages
Introduction: TPM: Presented by Andy Ginder Vice President, ABB Reliability Consulting 281-450-1081
No ratings yet
Introduction: TPM: Presented by Andy Ginder Vice President, ABB Reliability Consulting 281-450-1081
31 pages
Ayesha Umar Wahedi - CV 2016
No ratings yet
Ayesha Umar Wahedi - CV 2016
4 pages
Chapter 1 PR HUMSS 2
No ratings yet
Chapter 1 PR HUMSS 2
9 pages
Technology and Livelihood Education: Create Fancy Nail Designs
50% (4)
Technology and Livelihood Education: Create Fancy Nail Designs
34 pages
Centerfield Ballhawk. Lesson - Basal
No ratings yet
Centerfield Ballhawk. Lesson - Basal
10 pages
Gee 101 Chapter III Lesson 1 2
No ratings yet
Gee 101 Chapter III Lesson 1 2
115 pages
Midterm PPT Module in Missiology
No ratings yet
Midterm PPT Module in Missiology
85 pages
Grade Thresholds - June 2022: Cabeadlmbridge IGCSE Mathematics (Without Coursework) (0580)
No ratings yet
Grade Thresholds - June 2022: Cabeadlmbridge IGCSE Mathematics (Without Coursework) (0580)
1 page
Behavioral Matrix
No ratings yet
Behavioral Matrix
3 pages
TP - B - 47 - 13118244 - Arriza Parasadya Santosa
No ratings yet
TP - B - 47 - 13118244 - Arriza Parasadya Santosa
3 pages
Introduction and Evolution To HRD
100% (8)
Introduction and Evolution To HRD
42 pages
Schreber Father and Son
No ratings yet
Schreber Father and Son
20 pages
M.A Fee Sheduled PDF
No ratings yet
M.A Fee Sheduled PDF
2 pages
(FINAL REQUIREMENT) Top Secret Billionaire - VARON
No ratings yet
(FINAL REQUIREMENT) Top Secret Billionaire - VARON
3 pages
Advanced Speaking PDF
100% (1)
Advanced Speaking PDF
11 pages
Colegio de Dagupan CDC Proposal
No ratings yet
Colegio de Dagupan CDC Proposal
2 pages
DeepHipp Accurate Segmentation of Hippocampus Usin
No ratings yet
DeepHipp Accurate Segmentation of Hippocampus Usin
16 pages
March 2017
No ratings yet
March 2017
11 pages
KI Course Guide 2024
No ratings yet
KI Course Guide 2024
102 pages
Driver Ed Faq
No ratings yet
Driver Ed Faq
13 pages
Handball Course for Sport Science Students
No ratings yet
Handball Course for Sport Science Students
5 pages
Gs1 - l2 - Explanation
No ratings yet
Gs1 - l2 - Explanation
9 pages
Java R20
No ratings yet
Java R20
1 page
RonakNagpalResume 2
No ratings yet
RonakNagpalResume 2
1 page
Isms - Ignou.ac - in Changeadmdata AdmissionStatusNew - Asp
No ratings yet
Isms - Ignou.ac - in Changeadmdata AdmissionStatusNew - Asp
2 pages
MBA Live Project Guidelines
No ratings yet
MBA Live Project Guidelines
8 pages
S2000-020 STU Stus2000020
No ratings yet
S2000-020 STU Stus2000020
6 pages

02 - Lect2 Search Engines - Part1

Uploaded by

02 - Lect2 Search Engines - Part1

Uploaded by

Information Storage

Search Engine Architecture

The two primary requirements of a search engine are:

• Effectiveness (quality): We want to be able to retrieve

• Efficiency (speed): We want to process queries from

‣ Effectiveness – accuracy of results, presentation

‣ Efficiency / Performance – response time,

These factors deeply impact the architecture of these

Search engine components support two major functions,

2- the query process: the query process uses those

Its three tasks are:

- Query Transformation: The user-interface parses user queries, and

Its •three tasks are:

‣ Displays the top-ranked results

‣ Generates snippets to show how queries match documents

‣ Highlights important words and passages

‣ Retrieves query-relevant advertising.

‣ Query expansion adds terms related to the query terms (e.g.

‣ Relevance feedback runs an initial query, then uses the top-ranked

• Ranking must be both efficient, since many queries may need to be

 The efficiency of ranking depends on the indexes,

 The effectiveness depends on the retrieval model.

‣ A score is assigned to the most likely-relevant documents based

‣ Core component of a search engine, and often the most

‣ Many, many approaches and variations have been

• An important part of that is to record and analyze user behavior using

• Most of the evaluation component is not part of the online search

 Evaluation is primarily an offline activity, but it is a critical part of any

‣ Logging user interaction is an essential tool for

‣ Query logs and clickthrough data are used for query

• Logging. Query logs of the users’ interactions with the search

• They can improve the search experience, speed up results, store

You might also like