0% found this document useful (0 votes)

27 views19 pages

Unit 1

Uploaded by

nanipavan830

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views19 pages

Unit 1

Uploaded by

nanipavan830

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

‭UNIT-1‬

‭FUNCTIONAL OVERVIEW OF IRS‬

‭1. Normalizing Incoming Items:‬

‭ his step is about converting various types of incoming data into‬

T
‭a consistent, standard format so that they can be easily‬
‭processed and searched.‬

‭●‬ ‭Language Encoding:‬‭Ensure that text from different‬

‭languages is properly encoded, typically in Unicode, which‬
‭allows consistent display and search across languages.‬
‭●‬ ‭Different File Formats:‬‭Convert files from various formats‬
‭(like text, images, videos) into a standard format. For‬
‭example:‬
‭○‬ ‭Videos could be converted to formats like MPEG-2,‬
‭MPEG-1, AVI.‬
‭○‬ ‭Audio files to WAV, Real Audio.‬
‭○‬ ‭Images to GIF, JPEG, BMP.‬

‭2. Logical Restructuring – Zoning:‬

‭ reak down the content into meaningful sections. For example, if‬
B
‭you're processing an academic paper, divide it into sections like‬
‭Title, Author, Abstract, Main Text, Conclusion, References,‬
‭Keywords. This helps in more precise searching and better‬
‭display of search results.‬

‭3. Creating a Searchable Data Structure (Indexing):‬

‭This involves several steps:‬

‭1.‬‭Identification of Processing Tokens:‬

‭○‬ ‭Processing Tokens:‬‭These are the key pieces of‬
‭information used in searches, often better defined‬
‭than just words.‬
‭○‬ ‭Valid Word Symbols:‬‭Alphabetic characters and‬
‭numbers.‬
‭○‬ ‭Inter-Word Symbols:‬‭Blanks, periods, semicolons‬
‭(these don't affect the search).‬
‭○‬ ‭Special Processing Symbols:‬‭Hyphens.‬
‭2.‬‭W ords are defined as continuous sequences of valid word‬
‭symbols separated by inter-word symbols.‬
‭3.‬‭Stop Algorithm:‬
‭○‬ ‭Stop Words:‬‭Remove common words (like 'the', 'and')‬
‭that appear in almost every document, or words that‬
‭appear very infrequently, to save system resources.‬
‭ ‬ ‭Stop List:‬‭A predefined list of such stop words.‬
○
‭4.‬‭Characterize Tokens:‬
‭○‬ ‭W ord Characteristics:‬‭Identify specific features like‬
‭proper names, acronyms, numbers, dates.‬
‭○‬ ‭Part of Speech Tagging:‬‭Determine if the word is a‬
‭noun, verb, etc.‬
‭○‬ ‭W ord Sense Disambiguation:‬‭Understand the‬
‭meaning of a word based on context.‬
‭5.‬‭Stemming Algorithm:‬
‭○‬ ‭Stemming:‬‭Reduce words to their base or root form.‬
‭For example, 'computing', 'computers', and‬
‭'computation' are all reduced to 'comput'. This reduces‬
‭the number of unique words and saves storage space,‬
‭while also improving search efficiency.‬

‭4. Creating the Searchable Data Structure:‬

‭ fter processing tokens through the stemming algorithm, they‬

A
‭are updated into a searchable data structure. This structure‬
‭could be a signature file, inverted list, or PAT tree, and it‬
‭represents the semantic concepts of items in the database. It‬
‭limits what a user can find as a result of the search, ensuring‬
‭efficient and accurate retrieval of information.‬

‭Summary:‬

‭●‬ ‭Normalization:‬‭Convert and standardize different formats‬

‭and languages.‬
‭●‬ ‭Zoning:‬‭Break down content into logical sections.‬
‭●‬ ‭Token Identification:‬‭Identify important searchable‬‭tokens‬
‭and remove unnecessary ones.‬
‭●‬ ‭Token Characterization:‬‭Determine the specific features‬
‭and context of tokens.‬
‭●‬ ‭Stemming:‬‭Reduce words to their base form to save‬
‭space and improve search efficiency.‬
‭●‬ ‭Indexing:‬‭Create an internal structure that represents the‬
‭data and enables efficient searching.‬

‭Selective Dissemination of Information (SDI):‬

‭ DI is a system that automatically matches new information‬

S
‭against users' interests and delivers relevant items to them.‬

‭●‬ ‭How it works:‬

‭○‬ ‭Search Process:‬‭The system continuously searches‬
‭new items.‬
‭○‬ ‭User Profiles:‬‭Each user has a profile that describes‬
‭their interests.‬
‭○‬ ‭User Mail Files:‬‭W here the system stores items‬
‭matching user interests.‬
‭●‬ ‭User Profile:‬
‭○‬ ‭A broad search statement that describes what the‬
‭user is interested in.‬
‭○‬ ‭A list of mail files to receive documents that match the‬
‭search statement.‬
‭○‬ ‭W hen a new item matches the profile, it is sent to the‬
‭associated mail files.‬
‭●‬ ‭Difference from Ad Hoc Queries:‬
‭○‬ ‭Profiles have many search terms and cover a wide‬
‭range of interests.‬
‭○‬ ‭Ad hoc queries are short and specific.‬

‭Document Database Search:‬

‭ his allows users to search all items that have been received‬
T
‭and stored in the system.‬

‭●‬ ‭Components:‬
‭○‬ ‭Search Process:‬‭The mechanism that handles‬
‭searches.‬
‭○‬ ‭User Queries:‬‭Specific search statements entered by‬
‭users.‬
‭○‬ ‭Document Database:‬‭The collection of all processed‬
‭and stored items.‬
‭●‬ ‭Characteristics of Document Database:‬
‭○‬ ‭Items usually do not change once stored.‬
‭○‬ ‭It can be partitioned by time and allow for archiving.‬
‭●‬ ‭Difference from Profiles:‬‭Queries are short and focused‬
‭on specific interests.‬

‭Index Database Search:‬

‭ sers can save and organize items for future reference through‬
U
‭indexing.‬

‭●‬ ‭Index Process:‬

‭○‬ ‭Users can add items to an index with extra terms and‬
‭descriptions.‬
‭○‬ ‭The index can point to the original item or contain‬
‭detailed information about it.‬
‭●‬ ‭Components:‬
‭○‬ ‭Indexes:‬‭Like a library card catalog, they help‬
‭organize and find items.‬
‭○‬ ‭Index Database Search Process:‬‭Lets users create‬
‭and search indexes.‬
‭○‬ ‭Users can search the index and retrieve either the‬
‭index itself or the original item.‬
‭●‬ ‭Types of Index Files:‬
‭○‬ ‭Public Index Files:‬‭Managed by library staff and‬
‭include all items in the Document Database.‬
‭○‬ ‭Private Index Files:‬‭Created by individual users,‬
‭each user can have multiple private indexes.‬

‭Combined File Search:‬

‭ his process integrates searches across both the document and‬
T
‭index databases.‬

‭●‬ ‭Public vs. Private Index Files:‬

‭○‬ ‭Public index files cover all items and are accessible to‬
‭all users.‬
‭○‬ ‭Private index files are specific to individual users and‬
‭cover a smaller subset of items.‬
‭●‬ ‭Database Management System:‬
‭○‬ ‭Often, index files are managed using a structured‬
‭database management system (RDBMS).‬

‭Automatic File Build (Information Extraction):‬

‭This process helps create indexes automatically.‬

‭●‬ ‭How it works:‬

‭○‬ ‭Processes new documents and identifies key‬
‭information like authors, publication date, source, and‬
‭references.‬
‭○‬ ‭Rules for which documents to process and how to‬
‭extract index terms are stored in Automatic File Build‬
‭Profiles.‬
‭●‬ ‭Candidate Index Records:‬
‭○‬ ‭The result of processing new documents.‬
‭○‬ ‭Reviewed and edited by users before updating the‬
‭actual index file.‬

‭Summary:‬

‭●‬ ‭SDI:‬‭Automatically matches new items to user interests‬‭and‬

‭delivers relevant information.‬
‭●‬ ‭Document Database Search:‬‭Allows users to search all‬
‭stored items.‬
‭●‬ ‭Index Database Search:‬‭Enables users to save, organize,‬
‭and search items using indexes.‬
‭●‬ ‭Combined File Search:‬‭Integrates document and index‬
‭searches.‬
‭●‬ ‭Automatic File Build:‬‭Automates the creation of index‬
‭records by extracting key information from new documents‬

‭DIGITAL LIBRARY‬
‭DATA WAREHOUSE‬
‭IRS CAPABILITIES‬
‭Boolean Logic:‬

‭●‬ ‭Boolean logic allows users to combine search terms using‬

‭operators like AND, OR, and NOT. For instance, "cats AND‬
‭dogs" retrieves items containing both words, "cats OR‬
‭dogs" retrieves items containing either word, and "cats‬
‭NOT dogs" retrieves items containing "cats" but excluding‬
‭"dogs."‬

‭Proximity:‬
‭●‬ ‭Proximity search looks for words that appear close to each‬
‭other within a specified distance. For example, searching‬
‭"bake NEAR/5 cake" finds instances where "bake" and‬
‭"cake" appear within five words of each other, which helps‬
‭in locating related terms in context.‬

‭Contiguous Word Phrases:‬

‭●‬ ‭This capability searches for exact phrases where words‬

‭appear together in the same order. For example, searching‬
‭for "climate change" returns results where these two words‬
‭are next to each other, ensuring the phrase's specific‬
‭context is maintained in the search results.‬

‭Fuzzy Searches:‬

‭●‬ ‭Fuzzy searches find words that are similar to the search‬
‭term, accommodating spelling variations and typos. For‬
‭example, searching for "color" might also return "colour."‬
‭This is useful when dealing with documents containing‬
‭typographical errors or different spellings of the same word.‬

‭Term Masking:‬

‭●‬ ‭Term masking uses wildcards to replace characters in a‬

‭search term. For example, "comp*" can find "computer,"‬
‭"compete," and "compile." The asterisk (*) represents any‬
‭number of characters, while a question mark (?) can‬
‭replace a single character, broadening the search scope.‬

‭Numeric & Date Ranges:‬

‭●‬ ‭This capability allows searching within specific numeric or‬

‭date ranges. For example, searching for documents from‬
‭2010 to 2020 or finding products priced between $50 and‬
‭ 100. It helps in filtering search results based on‬
$
‭quantitative criteria, like dates or numbers.‬

‭Concept & Thesaurus Expansions:‬

‭●‬ ‭This search capability includes related concepts or‬

‭synonyms to broaden search results. For example,‬
‭searching for "happy" might also retrieve "joyful" or‬
‭"content." Thesaurus expansions enhance search flexibility‬
‭by understanding and including variations in terminology,‬
‭ensuring comprehensive results.‬

‭Natural Language Queries:‬

‭●‬ ‭Natural language queries allow users to search using‬

‭everyday language, mimicking human conversation. For‬
‭example, instead of using keywords, a user might ask,‬
‭"What is the capital of France?" The system interprets the‬
‭question and retrieves relevant information, making‬
‭searches more intuitive.‬

‭Multimedia Queries:‬

‭●‬ ‭Multimedia queries enable searching for various types of‬

‭content such as images, videos, and audio files. For‬
‭example, finding all videos related to "wildlife." This‬
‭capability is essential for databases that include diverse‬
‭media types, allowing users to locate non-textual‬
‭information easily.‬

‭Browse Capabilities‬

‭1.‬‭Ranking:‬
‭○‬ ‭Ranking orders search results by relevance or‬
‭importance. This helps users see the most relevant‬
‭items first, based on criteria like keyword matches,‬
‭document popularity, or date of publication. For‬
‭example, a search for "renewable energy" will show‬
‭the most relevant articles at the top.‬
‭2.‬‭Zoning:‬
‭○‬ ‭Zoning divides a document into logical sections such‬
‭as title, author, abstract, and main text. This helps in‬
‭targeted searching within specific sections. For‬
‭example, a user might search only within the‬
‭"abstract" zone to find articles with relevant‬
‭summaries.‬
‭3.‬‭Highlighting:‬
‭○‬ ‭Highlighting visually emphasizes search terms in the‬
‭results. When users search for a keyword, this‬
‭feature highlights occurrences of that keyword in the‬
‭displayed documents. This makes it easier for users‬
‭to spot the relevant information quickly.‬

‭Miscellaneous Capabilities‬

‭1.‬‭Vocabulary Browse:‬
‭○‬ ‭Vocabulary browsing allows users to explore terms‬
‭and their relationships within a specific domain or‬
‭subject. It often includes browsing through an index or‬
‭thesaurus to find related terms and expand searches‬
‭effectively. For example, exploring synonyms and‬
‭related terms for "biodiversity."‬
‭2.‬‭Iterative Search & Search History Log:‬
‭○‬ ‭Iterative search involves refining searches based on‬
‭previous results to narrow down to the most relevant‬
‭information. The search history log keeps track of all‬
‭ earch queries, allowing users to revisit and refine‬
s
‭past searches for improved results.‬
‭3.‬‭Canned Query:‬
‭○‬ ‭Canned queries are pre-defined searches created for‬
‭common queries. These saved searches can be‬
‭quickly executed without having to re-enter the search‬
‭criteria. For example, a canned query for "latest‬
‭technology news" would fetch up-to-date articles on‬
‭that topic.‬
‭4.‬‭Multimedia:‬
‭○‬ ‭Multimedia capabilities involve searching and‬
‭retrieving various types of content like images,‬
‭videos, and audio files. For instance, users can‬
‭search for educational videos, photographs, or music‬
‭files, enabling a richer and more diverse search‬
‭experience.‬

CompletedUNIT 1 PPT 10.7.17
100% (6)
CompletedUNIT 1 PPT 10.7.17
87 pages
IRS Study Material
100% (1)
IRS Study Material
87 pages
Statistical Indexing Is A Method Used in Information Retrieval Systems
No ratings yet
Statistical Indexing Is A Method Used in Information Retrieval Systems
22 pages
Unit - 3:: Explain Briefly About Automatic Indexing? Explain About Types of Classes Automatic Indexing?
No ratings yet
Unit - 3:: Explain Briefly About Automatic Indexing? Explain About Types of Classes Automatic Indexing?
28 pages
Aesthetics and Technology in Building, Pier Luigi Nervi
100% (4)
Aesthetics and Technology in Building, Pier Luigi Nervi
146 pages
IRS UNIT - 3
No ratings yet
IRS UNIT - 3
68 pages
UNIT 2 IRS
No ratings yet
UNIT 2 IRS
25 pages
4-1
No ratings yet
4-1
21 pages
Explain Item Normalization?
No ratings yet
Explain Item Normalization?
7 pages
IRS UNIT-3 NOTES_241202_145950
No ratings yet
IRS UNIT-3 NOTES_241202_145950
21 pages
irs mid 2
No ratings yet
irs mid 2
14 pages
IRS
No ratings yet
IRS
88 pages
thesis
No ratings yet
thesis
49 pages
irs unit 1
No ratings yet
irs unit 1
10 pages
irs notes_merged (1)
No ratings yet
irs notes_merged (1)
166 pages
irs sem unit 5
No ratings yet
irs sem unit 5
8 pages
Unit 1
No ratings yet
Unit 1
108 pages
CAT King study material 3
No ratings yet
CAT King study material 3
25 pages
IRS Unit 1 by Krishna
No ratings yet
IRS Unit 1 by Krishna
33 pages
Python Data Structures Explained: A Practical Guide with Examples
From Everand
Python Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Irs
No ratings yet
Irs
8 pages
IRS_Unit_2
No ratings yet
IRS_Unit_2
15 pages
Irs PDF
No ratings yet
Irs PDF
68 pages
Information Retrieval Techniques
No ratings yet
Information Retrieval Techniques
63 pages
FULLTEXT01
No ratings yet
FULLTEXT01
32 pages
Introduction To Information Storage and Retrieval Systems: BY-Research Scholar
No ratings yet
Introduction To Information Storage and Retrieval Systems: BY-Research Scholar
42 pages
IRS U-1
No ratings yet
IRS U-1
49 pages
Irs Cs523pe
No ratings yet
Irs Cs523pe
15 pages
Information Retrivals Ans
No ratings yet
Information Retrivals Ans
78 pages
IR ASS1
No ratings yet
IR ASS1
12 pages
Irs I
No ratings yet
Irs I
20 pages
irs unit-1 modified
No ratings yet
irs unit-1 modified
12 pages
Unit-1 Chapter 1
No ratings yet
Unit-1 Chapter 1
44 pages
Irs Ia 1
No ratings yet
Irs Ia 1
12 pages
IR ans
No ratings yet
IR ans
13 pages
Unit-I: Introduction To Information Retrieval Systems
100% (1)
Unit-I: Introduction To Information Retrieval Systems
14 pages
Unit - 6
No ratings yet
Unit - 6
6 pages
Module 1 - Introduction
No ratings yet
Module 1 - Introduction
61 pages
Unit 4
No ratings yet
Unit 4
31 pages
Unit1 Mot
No ratings yet
Unit1 Mot
22 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
44 pages
IRS unit-1
No ratings yet
IRS unit-1
61 pages
IRS IMP Questions
No ratings yet
IRS IMP Questions
7 pages
IRSUnit-1
No ratings yet
IRSUnit-1
26 pages
UNIT I
No ratings yet
UNIT I
65 pages
1
No ratings yet
1
12 pages
IRS-1
No ratings yet
IRS-1
4 pages
SECOM 737: User Manual
No ratings yet
SECOM 737: User Manual
133 pages
11 Multimedia Media IR
No ratings yet
11 Multimedia Media IR
19 pages
Unit - 1
No ratings yet
Unit - 1
51 pages
syllabus
No ratings yet
syllabus
2 pages
FHS UG and Hons 2024 v2
No ratings yet
FHS UG and Hons 2024 v2
320 pages
C++ File Handling Step by Step: A Practical Guide with Examples
From Everand
C++ File Handling Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Module 1print
No ratings yet
Module 1print
5 pages
Background (1/4) : Slide 1 Slide 3
No ratings yet
Background (1/4) : Slide 1 Slide 3
7 pages
Functional Overview of an Information Retrieval System
No ratings yet
Functional Overview of an Information Retrieval System
1 page
Advanced Completion Technology Course - Top 50 Pages
No ratings yet
Advanced Completion Technology Course - Top 50 Pages
53 pages
CD Unit-2 Part 1
No ratings yet
CD Unit-2 Part 1
26 pages
Irs Unit-1
No ratings yet
Irs Unit-1
61 pages
Computervisionandrobotics 181108104159
No ratings yet
Computervisionandrobotics 181108104159
61 pages
Data Structures I Essentials
From Everand
Data Structures I Essentials
Dennis Smolarski
No ratings yet
Wollo University Kombolcha Institute of Technology College of Informatics Department of Information Technology
100% (1)
Wollo University Kombolcha Institute of Technology College of Informatics Department of Information Technology
35 pages
Attendance_Management_System_Synopsis
No ratings yet
Attendance_Management_System_Synopsis
22 pages
Comprehensive Guide To Network Unlock Codes For Samsung
No ratings yet
Comprehensive Guide To Network Unlock Codes For Samsung
14 pages
Sub-Culture Theory
No ratings yet
Sub-Culture Theory
5 pages
Mini Project Sample Report
No ratings yet
Mini Project Sample Report
83 pages
WT Tutorials Feb2020
No ratings yet
WT Tutorials Feb2020
32 pages
Local File Checklist
No ratings yet
Local File Checklist
6 pages
Sample
No ratings yet
Sample
28 pages
02 - PAST TENSES Ok
No ratings yet
02 - PAST TENSES Ok
11 pages
S770 Elec Std-Op 7210766-C
No ratings yet
S770 Elec Std-Op 7210766-C
13 pages
The Recovery Plan Project Case - Risk Quantification Methods
No ratings yet
The Recovery Plan Project Case - Risk Quantification Methods
5 pages
Economics As A Social Science
No ratings yet
Economics As A Social Science
14 pages
Coding Questions 2 - Accenture
No ratings yet
Coding Questions 2 - Accenture
6 pages
Irs Unit1
No ratings yet
Irs Unit1
15 pages
MissingEvidence DigitalPDF SinglePages
No ratings yet
MissingEvidence DigitalPDF SinglePages
60 pages
UNIT 3 - Part 1 Google Docs
No ratings yet
UNIT 3 - Part 1 Google Docs
13 pages
Visualizing Data Structures
From Everand
Visualizing Data Structures
Rhonda Hoenigman
No ratings yet
Twin and Full Size Platform Bed Project Diagram
No ratings yet
Twin and Full Size Platform Bed Project Diagram
8 pages
2012 Probability Past IB Questions
No ratings yet
2012 Probability Past IB Questions
18 pages
....
No ratings yet
....
3 pages
DBMS unit 1-5 notes (1)
No ratings yet
DBMS unit 1-5 notes (1)
69 pages
Pec Shorts
No ratings yet
Pec Shorts
4 pages
Confidence Building
No ratings yet
Confidence Building
4 pages
Muharram - Holiday Circular
No ratings yet
Muharram - Holiday Circular
1 page
208rev ICOMOS 230 en
No ratings yet
208rev ICOMOS 230 en
4 pages
VTS JD 2025 (3.0)
No ratings yet
VTS JD 2025 (3.0)
6 pages
Unit 2
No ratings yet
Unit 2
6 pages
Ineffective Tissue Perfusion
No ratings yet
Ineffective Tissue Perfusion
2 pages
Project: COLLINGWOOD VILLAGE, Vancouver, B.C
No ratings yet
Project: COLLINGWOOD VILLAGE, Vancouver, B.C
20 pages
ML Important Questions For Preparation All Units 2022
No ratings yet
ML Important Questions For Preparation All Units 2022
12 pages
17011/HYB SKZR EXP Second Sitting (2S)
No ratings yet
17011/HYB SKZR EXP Second Sitting (2S)
2 pages
Shorts
No ratings yet
Shorts
2 pages
Qualitative Inquiry: Trustworthiness: Quality Research Standards For
No ratings yet
Qualitative Inquiry: Trustworthiness: Quality Research Standards For
24 pages
Golden Moment To Become Transf - Leader
No ratings yet
Golden Moment To Become Transf - Leader
9 pages
Helmet Detection Using Machine Learning and Automatic License Plate Recognition
No ratings yet
Helmet Detection Using Machine Learning and Automatic License Plate Recognition
18 pages
Irs Important Questions
0% (1)
Irs Important Questions
3 pages
Wa0009.
No ratings yet
Wa0009.
18 pages
Maternal and Child Health Nursing: Keeps
No ratings yet
Maternal and Child Health Nursing: Keeps
32 pages
AugustSeptember 2021
No ratings yet
AugustSeptember 2021
1 page
Francis! Francis! X1 Manual
100% (1)
Francis! Francis! X1 Manual
5 pages
National Exams (2015 - 2021) Language
No ratings yet
National Exams (2015 - 2021) Language
15 pages
CD Assignment - 2
No ratings yet
CD Assignment - 2
1 page
Execution
No ratings yet
Execution
2 pages
Tips For Passing The Civil Service Exam
No ratings yet
Tips For Passing The Civil Service Exam
4 pages
0 BATCH
No ratings yet
0 BATCH
4 pages
Bozkus 2009
No ratings yet
Bozkus 2009
6 pages
Cover Page of Mini Project 2024-25
No ratings yet
Cover Page of Mini Project 2024-25
5 pages
Cisco Hierarchical Color Aware Policing
No ratings yet
Cisco Hierarchical Color Aware Policing
14 pages
17405/krishna Express Second Sitting (2S)
No ratings yet
17405/krishna Express Second Sitting (2S)
3 pages
OD332463059110361100
No ratings yet
OD332463059110361100
2 pages
Important Questions
No ratings yet
Important Questions
2 pages
Dss - Lesson Plan
No ratings yet
Dss - Lesson Plan
3 pages
Definition of Literature
No ratings yet
Definition of Literature
2 pages
Research Paper
No ratings yet
Research Paper
4 pages
Kvyafr
No ratings yet
Kvyafr
1 page
Avinash App
No ratings yet
Avinash App
1 page
New Creative Hands Brochure
No ratings yet
New Creative Hands Brochure
2 pages

Unit 1

Uploaded by

Unit 1

Uploaded by

‭UNIT-1‬

‭FUNCTIONAL OVERVIEW OF IRS‬

‭ his step is about converting various types of incoming data into‬

‭●‬ ‭Language Encoding:‬‭Ensure that text from different‬

‭2. Logical Restructuring – Zoning:‬

‭3. Creating a Searchable Data Structure (Indexing):‬

‭This involves several steps:‬

‭1.‬‭Identification of Processing Tokens:‬

‭4. Creating the Searchable Data Structure:‬

‭ fter processing tokens through the stemming algorithm, they‬

‭●‬ ‭Normalization:‬‭Convert and standardize different formats‬

‭Selective Dissemination of Information (SDI):‬

‭ DI is a system that automatically matches new information‬

‭●‬ ‭How it works:‬

‭Document Database Search:‬

‭Index Database Search:‬

‭●‬ ‭Index Process:‬

‭Combined File Search:‬

‭●‬ ‭Public vs. Private Index Files:‬

‭Automatic File Build (Information Extraction):‬

‭This process helps create indexes automatically.‬

‭●‬ ‭How it works:‬

‭●‬ ‭SDI:‬‭Automatically matches new items to user interests‬‭and‬

‭●‬ ‭Boolean logic allows users to combine search terms using‬

‭Contiguous Word Phrases:‬

‭●‬ ‭This capability searches for exact phrases where words‬

‭●‬ ‭Term masking uses wildcards to replace characters in a‬

‭Numeric & Date Ranges:‬

‭●‬ ‭This capability allows searching within specific numeric or‬

‭Concept & Thesaurus Expansions:‬

‭●‬ ‭This search capability includes related concepts or‬

‭Natural Language Queries:‬

‭●‬ ‭Natural language queries allow users to search using‬

‭●‬ ‭Multimedia queries enable searching for various types of‬

You might also like