IR
IR
IR
To avoid linearly scanning the texts for each INDEX query the document should be _______ in
advance.
A. Preprocessed B. designed
C. Formated D. indexed
2.Querying of unstructured textual data is referred to as
A. Information access B. Information updation
C. Information manipulation D. Information retrieval
3.The ________ refers to being able to ask any query in the form of AND,OR or NOT
expression of terms.
A. Index B. Incidence matrix
C. Binary retrieval model D. Boolean retrieval model
4.A better idea to build a term-document matrix is ______ where we record only the things that
do occur and their links
A. Incidence matrix. B. Adjecency matrix.
C. index D. Inverted index
5.A dictionary of terms is sometime also referred as
A. Lexicon B. Collection
C. Corpus D. none of the above
6._________retrieval requires term frequency information for documents in posting list.
A. Boolean Retrieval B. Frequent Retrieval
C. current Retrieval D. Ranked Retrieval
7.Edit distance (Levenshtein distance) is a way of:
A. Context-sensitive spelling correction B. Document correction
C. Isolated word correction D. Phonetic correction
8.Boolean retrieval model does not provide provision for:
A. Ranked search B. Proximity search
C. Phrase search D. Both proximity and ranked search.
9.Permuterm indices are used for solving
A. None B. Boolean queries
B. Phrase queries D. Wildcard queries
10.A large repository of documents in IR is called as:
A. Corpus B. Database
C. Dictionary D. Collection
11.Benefits of using a hash table is:
A. Do not need to rehash everything periodically if vocabulary keeps growing.
B. Lookup in a hash table is faster than lookup in a tree. C. All of the above
D. No prefix search is required
12.Benefits of using B-trees:
A. Re-balancing is cheap B. Balanced trees allow efficient retrieval
C. Faster O(log M) D. Solves the prefix problem.
13.Postings list should be sorted by:
A. Document Frequency B. DocID
C. TermID D. Term frequency
14.The goal of IR is to:
A. find documents relevant to an information need
B. find documents relevant to an information need from a given document set
C. find documents relevant to an information need from a large document set
D. find documents relevant to an information need from a small document set
15.Term-document incidence matrix is:
A. Sparse B. Depends upon the data
C. Dense D. Cannot predict
16.A______________ is a list of the observed categories and a count of the number of
observations in each.
A. Matrix B. Frame
C. Frequency distribution D. None
17.Document frequency of a term is the:
A. Number of documents that contain the term.
B. None of the above.
C. Number of times the term appears in the document
D. Number of times the term appears in the collection.
18.Boolean queries often result in:
A. Too many or too few results B. None of the above.
C. Too few results
D. Too many results.
19.The more frequent the query term in the document is:
A. The lesser the score of the document.
B. Does not make any affect.
C. The higher the score of the document.
D. None of the above.
20.The Jaccard coefficient is:
A. [XUY]/[XnY]
B. [XnY]/[XnY]
C. [XnY]/[XUY]
D. [XnY]
21.Wildcard Queries can be solved using :
A. Inverted index
B. Permuterm index
C. Binary Tree
D. None
22.Soundex is a class of heuristics to expand a query into its
A. synonyms
B. phonetic equivalents
C. similar words
D. None
23._________is a term-document matrix, where we record only the things that do occur and their
links.
A. Incidence matrix.
B. Adjecency matrix.
C. index
D. Inverted index
24.Edit distance (Levenshtein distance) is a technique which can be used in:
A. Context-sensitive spelling correction
B. Document correction
C. Isolated word correction
D. Phonetic correction
25.Boolean retrieval model can not be used for:
A. Ranked search
B. Proximity search
C. Phrase search
D. Both proximity and ranked search.
26.Which of the following statement is true for B-trees:
A. Re-balancing is cheap
B. Balanced trees allow efficient retrieval
C. Faster O(log M)
D. Solves the prefix problem.
27. Extremely common words which would appear to be of little value in terms of information
retrieval ,that are excluded from the index vocabulary are called:
28. the process of chopping off the ends of the word to reduce it to its root form for reducing the
size of vocabulary is called:
A.Lemmatization B.Case Folding C.True casing F.Stemming
29. Which of the following is a technique for context sensitive spelling correction:
30. Given two strings s1 and s2, the edit distance between them is sometimes known as the:
A. Levenshtein distance B.isolated-term distance C.k-gram overlap D.Jaccard Coefficient
31. A measure of similarity between two vectors which is determined by measuring the angle
between them is called:
D.None
ii. Crawler
iv. indexing
v. Store in database
a. ii ,iii, i,iv,v
b. ii, i, iii,iv,v
c. i, ii,iii,iv,v
d. i, iv, v, iii, ii
40. XQuery is a functional query language used to retrieve information stored in ---format.
a. Html
b. Xml
c. Uml
d. Jscript
41.XPath specification has _________ types of nodes
a)Four
b)Five
c)Six
d)Seven