Lecture Notes: CS6007 - Information Retrieval

This document discusses key concepts related to web search engines and crawling. It begins by defining common terms like web servers, browsers, and different paid submission and inclusion programs offered by search services. It then explains search engine optimization (SEO) and how it differs from pay-per-click advertising. The document also defines web crawlers and their purpose, as well as focused crawlers and near-duplicate detection. It concludes by listing requirements for XML information retrieval systems.

Uploaded by

Aswath Ah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views3 pages

Lecture Notes: CS6007 - Information Retrieval

Uploaded by

Aswath Ah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Lecture Notes

UNIT III – WEB SEARCH ENGINE – INTRODUCTION AND CRAWLING

Part A – Question Bank

1. Define web server.

Web server is a computer connected to the internet that runs a program that takes
responsibility for storing, retrieving and distributing some of the web files.

2. What is web Browsers?

A web browser is a program. Web browser is used to communicate with web
servers on the Internet, Which enables it to download and display the web pages.
Netscape Navigator and Microsoft Internet Explorer are the most popular browser
software’s available in market.

3. Explain paid submission of search service.

In paid submission user submit website for review by a search service for a preset
fee with the expectation that the site will be accepted and include d in that company’s
search engine, provided it meets the stated guidelines for submission. Yahoo! is the major
search engine that accepts this type of submission. While paid submissions guarantee a
timely review of the submitted site and notice of acceptance or rejection, you’re not
guaranteed inclusion or a particular placement order in the listings.

4. Explain paid inclusion programs of search services.

Paid inclusion programs allow you to submit your website for guaranteed
inclusion in a search engines database of listings for a set period of time. While paid
inclusion guarantees indexing of submitted pages or sites in a search database, you’re not
guaranteed that the pages will rank well for particular queries.

5. Explain in pay-for-placement of search services.

In pay-for-placement, you can guarantee a ranking in a search listing for the terms
of your choice. Also known as paid placement, paid listing, or sponsored listings, this
program guarantees placement in search results. The leaders in pay-for-placement are
Google, Yahoo! and Bing.

6. Define Search Engine Optimization.

Search Engine Optimization is the act of modifying a website to increase its
ranking in organic, crawler-based listing of search engines. There are several ways to
increase the visibility of your website through the major search engines on the internet

CS6007 -Information Retrieval Page 1

www.studentsfocus.com
Lecture Notes

today. The two most common forms of internet marketing paid placement and natural
placement.

7. Describe benefit of SEO.

Increase your search engine visibility
Generate more traffic from the major search engines.
Make sure your website and business get NOTICED and VISITED.
Grow your client base and increase business revenue.

8. Explain the difference between SEO and Pay-per-click

SEO Pay-Per-click
SEO results take 2 weeks to 4 months It results in 1-2 days
It is very difficult to control flow of traffic It has ability to turn on and at any moment
Requires ongoing learning and experience Easier for a novice
to reap results
It is more difficult to target local markets Ability to target “local” markets
Better for long-term and lower margin Better for short-term and high-margin
campaigns campaigns.
Generally more cost-effective , does not Generally more costly per visitor and per
penalize for more traffic conversion

9. What is web crawler?

A web crawler is a program which browses the world web in a methodical,
automated manner. Web crawlers are mainly used to create a copy of all the visited pages
for later processing by a search engine that will index the downloaded pages to p[provide
fast searches.

10. Define focused crawler.

A focused crawler or topical crawler is a web crawler that attempts to download
only pages that are relevant to a pre-defined topic or set of topic.

11. What is hard and soft focused crawling?

In hard focused crawling the classifier is invoked on a newly crawled document in a
standard manner. When it returns the best matching category path, the out-neighbors of
the page are checked into the database if and only if some node on the best matching
category path is marked as good.

CS6007 -Information Retrieval Page 2

www.studentsfocus.com
Lecture Notes

In soft focused crawling all out-neighbors of a visited page are checked into DB2, but
their crawl priority is based on the relevance of the current page.

12. What is the Near-duplicate detection?

Near-duplicate is the task of identifying documents with almost identical content.
Near- duplicate web documents are abundant. Two such documents differ from each
other in a very small portion that displays advertisements, for example. Such differences
are irrelevant and for web search.

13. What are requirements of XML information retrieval systems?

Query language that allows users to specify the nature of relevant components, in
particular with respect to their structure.
Representation strategies providing a description not only of the content of XML
documents, but also their structure.
Ranking strategies that determine the most relevant elements and rank these
appropriately for a given query.

CS6007 -Information Retrieval Page 3

www.studentsfocus.com

Cambridge Primary English 2ED Learner - S Book 6
No ratings yet
Cambridge Primary English 2ED Learner - S Book 6
179 pages
IR Module 3
No ratings yet
IR Module 3
45 pages
Web Search. Web Spidering
No ratings yet
Web Search. Web Spidering
44 pages
Unit 1
No ratings yet
Unit 1
32 pages
Thunderbird Analysis - Web Data
No ratings yet
Thunderbird Analysis - Web Data
915 pages
Digital Marketing by Manipal
No ratings yet
Digital Marketing by Manipal
131 pages
Lecture 11 - Web Search, Crawling, and Indexes
No ratings yet
Lecture 11 - Web Search, Crawling, and Indexes
62 pages
10-Searching The Web
100% (1)
10-Searching The Web
27 pages
SoftHelp en
No ratings yet
SoftHelp en
1,027 pages
CCW332-DGM Qb-Ii
No ratings yet
CCW332-DGM Qb-Ii
67 pages
Ctit Solution U3
No ratings yet
Ctit Solution U3
22 pages
Information Retrieval QA
No ratings yet
Information Retrieval QA
8 pages
Internal All
No ratings yet
Internal All
8 pages
Unit II QB
No ratings yet
Unit II QB
5 pages
? Difference Between On Page and Off Page Seo
No ratings yet
? Difference Between On Page and Off Page Seo
3 pages
SEO Interview Questions & Answers
No ratings yet
SEO Interview Questions & Answers
36 pages
SEO
No ratings yet
SEO
7 pages
FSD-module-2 Lab Programs
No ratings yet
FSD-module-2 Lab Programs
17 pages
Chapter 3
No ratings yet
Chapter 3
39 pages
User Agents
No ratings yet
User Agents
15 pages
IR Workbook Answers
No ratings yet
IR Workbook Answers
36 pages
Unit II QB
No ratings yet
Unit II QB
6 pages
All Passwords
No ratings yet
All Passwords
37 pages
Browsing History
No ratings yet
Browsing History
207 pages
Vitara Service Manual
No ratings yet
Vitara Service Manual
835 pages
Unraveling Some of The Mysteries Around DOM-based XSS
100% (1)
Unraveling Some of The Mysteries Around DOM-based XSS
36 pages
Invision Power Board - Vulnerability Report: Project Insecurity - Insecurity - SH
No ratings yet
Invision Power Board - Vulnerability Report: Project Insecurity - Insecurity - SH
13 pages
Ai ML Text Media and Web Analytics
No ratings yet
Ai ML Text Media and Web Analytics
5 pages
Wa0049.
No ratings yet
Wa0049.
4 pages
Introduction To The Back-End
No ratings yet
Introduction To The Back-End
15 pages
SEO Question 101
No ratings yet
SEO Question 101
44 pages
Gogo Anime Resolver
No ratings yet
Gogo Anime Resolver
2 pages
Quiz - 2 - Information Retrieval (S2-22 - AIMLCZG537 - S2-22 - DSECLZG537)
No ratings yet
Quiz - 2 - Information Retrieval (S2-22 - AIMLCZG537 - S2-22 - DSECLZG537)
5 pages
Kuldeep Lab File Edited Final
No ratings yet
Kuldeep Lab File Edited Final
51 pages
Assignment 3 of DM
No ratings yet
Assignment 3 of DM
7 pages
Shiva Panchakshari Stotram Telugu Large
100% (1)
Shiva Panchakshari Stotram Telugu Large
2 pages
Page Source
No ratings yet
Page Source
2 pages
rg3915 - Django-Experience - Tutorial Django Experience 2022
No ratings yet
rg3915 - Django-Experience - Tutorial Django Experience 2022
2 pages
Quarter 2 Summative Test 2 and Performance Task 2
No ratings yet
Quarter 2 Summative Test 2 and Performance Task 2
2 pages
Cali) Ngasan - Search Engine
No ratings yet
Cali) Ngasan - Search Engine
98 pages
Shijin Lab File Edited Final
No ratings yet
Shijin Lab File Edited Final
53 pages
Cookies
No ratings yet
Cookies
4 pages
Interview Question For Trainee For SEO
No ratings yet
Interview Question For Trainee For SEO
7 pages
Seoexam Questionswithanswers 170326190916
No ratings yet
Seoexam Questionswithanswers 170326190916
7 pages
Sir Waqar: Introduction To Information and Technology
No ratings yet
Sir Waqar: Introduction To Information and Technology
5 pages
Age Report
No ratings yet
Age Report
8 pages
Abhishek
No ratings yet
Abhishek
10 pages
Terra
No ratings yet
Terra
9 pages
Learningcourses
No ratings yet
Learningcourses
17 pages
Lecture Notes: CS6007 - Information Retrieval
No ratings yet
Lecture Notes: CS6007 - Information Retrieval
3 pages
An Approach For Search Engine Optimization Using Classification - A Data Mining Technique
No ratings yet
An Approach For Search Engine Optimization Using Classification - A Data Mining Technique
4 pages
34234234234234234
100% (1)
34234234234234234
13 pages
200+ TOP SEO Online Quiz Questions - Exam Test
No ratings yet
200+ TOP SEO Online Quiz Questions - Exam Test
20 pages
SEO - Web Designing - Google Analytics Question Bank
100% (1)
SEO - Web Designing - Google Analytics Question Bank
17 pages
JSP Architecture
No ratings yet
JSP Architecture
2 pages
UNIT 3 Notes
No ratings yet
UNIT 3 Notes
32 pages
Content Management System (CMS)
No ratings yet
Content Management System (CMS)
13 pages
Different Types of Web Crawlers
No ratings yet
Different Types of Web Crawlers
40 pages
Case Study Catering Services
No ratings yet
Case Study Catering Services
1 page
Websearch
No ratings yet
Websearch
21 pages
Leaflet
No ratings yet
Leaflet
2 pages
SEO Interview Questions 207-1
No ratings yet
SEO Interview Questions 207-1
35 pages
Digital Marketing
No ratings yet
Digital Marketing
26 pages
SEO Midterm
No ratings yet
SEO Midterm
4 pages
Unit - 3 Ir Questionbank
No ratings yet
Unit - 3 Ir Questionbank
27 pages
All Questions and Answers About SEO 1. What Is SEO & Why Is It So Important?
No ratings yet
All Questions and Answers About SEO 1. What Is SEO & Why Is It So Important?
10 pages
Working of Search Engines: Avinash Kumar Widhani, Ankit Tripathi and Rohit Sharma Lnmiit
No ratings yet
Working of Search Engines: Avinash Kumar Widhani, Ankit Tripathi and Rohit Sharma Lnmiit
13 pages
Nap Fa 1
No ratings yet
Nap Fa 1
2 pages
SEO Secrets - How to Dominate Search Rankings and Drive Traffic
From Everand
SEO Secrets - How to Dominate Search Rankings and Drive Traffic
Jason Carter
No ratings yet
SEO, SEM & SMM for Small Business Owners: SEO, SEM & SMM SERIES, #1
From Everand
SEO, SEM & SMM for Small Business Owners: SEO, SEM & SMM SERIES, #1
Harriet Fosuah Quansah
No ratings yet
SEO Mastery 2025 #1 Workbook to Learn Secret Search Engine Optimization Strategies to Boost and Improve Your Organic Search Ranking
From Everand
SEO Mastery 2025 #1 Workbook to Learn Secret Search Engine Optimization Strategies to Boost and Improve Your Organic Search Ranking
Matthew Michaels
No ratings yet
SEO Skills & Mastery: Get More Traffic with SEO
From Everand
SEO Skills & Mastery: Get More Traffic with SEO
Sarah May Hack
No ratings yet
SEO: The Ultimate Guide to Optimize Your Website. Learn Effective Techniques to Reach the First Page and Finally Improve Your Organic Traffic.
From Everand
SEO: The Ultimate Guide to Optimize Your Website. Learn Effective Techniques to Reach the First Page and Finally Improve Your Organic Traffic.
Philip Hayes
4.5/5 (6)
Mastering Search Engine Marketing: A Guide for SEM Campaign Success
From Everand
Mastering Search Engine Marketing: A Guide for SEM Campaign Success
Rebecca Cox
No ratings yet
Unleash The Power of SEO: Master The Art Of Online Visibility
From Everand
Unleash The Power of SEO: Master The Art Of Online Visibility
Rebecca Cox
No ratings yet
SEO for Beginners: For Beginners
From Everand
SEO for Beginners: For Beginners
Jonathan Smith
No ratings yet
An Introduction To SEO
From Everand
An Introduction To SEO
Nirmalya Roy
No ratings yet
Drupal Search Engine Optimization
From Everand
Drupal Search Engine Optimization
Ric Shreves
No ratings yet
SEO Strategies
From Everand
SEO Strategies
Mila Petrovick
No ratings yet
SEO SoS: Search Engine Optimization First Aid Guide ePub Edition
From Everand
SEO SoS: Search Engine Optimization First Aid Guide ePub Edition
Darren Varndell
No ratings yet
Seo Learning Guide
From Everand
Seo Learning Guide
ngencoband
No ratings yet
SEO for Beginners II: For Beginners
From Everand
SEO for Beginners II: For Beginners
Jonathan Smith
No ratings yet
The Ultimate Guide to SEO Success in 2024
From Everand
The Ultimate Guide to SEO Success in 2024
Ali Zahed
No ratings yet
WordPress Optimization: The Basics
From Everand
WordPress Optimization: The Basics
Janet Amber
No ratings yet
What Does an SEO Agency Do: What Does an SEO Agency Do
From Everand
What Does an SEO Agency Do: What Does an SEO Agency Do
Mayfair Digital Agency
No ratings yet
A Beginner's Guide to Online Marketing
From Everand
A Beginner's Guide to Online Marketing
Steven Mcananey
No ratings yet
SEO for Beginners A Step-by-Step Guide to Ranking Higher
From Everand
SEO for Beginners A Step-by-Step Guide to Ranking Higher
Steven Mcananey
No ratings yet
Article Strategy; The Basics
From Everand
Article Strategy; The Basics
Janet Amber
No ratings yet
SEO Basics - Tips for Small Business Owners
From Everand
SEO Basics - Tips for Small Business Owners
Thrivelearning institute Library
No ratings yet
Web Development for Profit: The Basics
From Everand
Web Development for Profit: The Basics
Janet Amber
No ratings yet