20AIEL708 MINING MASSIVE DATASETS L T P C
3 0 0 3
Course Objectives:
To understand the basic concepts, principles, and techniques in data mining
To learn the classical data mining algorithms
To perform systematic analyze the real world data mining problems
To model data mining problems and evaluate, visualize and communicate
statistical models
UNIT I INTRODUCTION 9
Data Mining – Modeling, Statistical limits on Data Mining, Importance of Words in
Documents, Hash Functions, indexe.s Map Reduce and the New Software Stack –
Distributed File systems, Map Reduce, Algorithms using Map Reduce, Extensions to
Map Reduce, The Communication Cost Model, Complexity „theory For Map Reduce.
UNIT II RECOMMENDATION SYSTEM 9
Finding Similar Items – Applications of Set similarity, Shingling of Documents,
Similarities Preserving Summaries of Sets, Locality Sensitive Hashing for Documents,
Distance Measures, theory of locality-sensitive functions, LSH families for Other
Distance Measures, Applications of LSH, Methods for High Degrees of Similarity.
UNIT III SOCIAL NETWORKS 9
Mining Data Streams – The Stream Data Model, Sampling Data in a stream, Filtering
Streams, Counting Distinct Elements in a Stream, Estimating Moments, Counting Ones
in a Window, Decaying Windows. Link Analysis -Page Rank, Efficient Computation of
Page Rank, Topic Sensitive Page Rank, Link Spam, Hubs and Authorities
UNIT IV MINING DATA STREAMS 9
Frequent Itemsets- Market- Basket Model, A-Priori Algorithm, Handling Larger Datasets
in Main Memory, Limited –Pass Algorithm , Counting Frequent Items in a Stream,
Clustering – Introduction, Hierarchical Clustering, K-Means Clustering, CURE
Algorithm, Clustering in Non- Euclidean Spaces, Clustering for Streams and Parallelism
Advertising on the Web- Issues in online Algorithms, The matching Problem, The
Adwords Problem, Adwords implementation
UNIT V CONTEXTUAL-BANDIT APPROACH 9
Recommendation System Content-Based Recommendations-Item Profiles,Discovering
Features of Documents, Obtaining Item Features From Tags,Representing Item
Profiles,User Profiles,Recommending Items to Users Based on Content, Classification
Algorithms, A Contextual-Bandit Approach to Personalized News Article
Recommendation.
TEXTBOOK
1. Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman, “Mining of Massive
Datasets“, 2019.
COURSE OUTCOMES :
1. Understand the fundamental concepts and statistical limits of data mining and
MapReduce programming model (K2)
2. Understand similarity detection techniques to identify similar items in large
datasets.(K2)
3. Apply appropriate algorithms to perform link analysis and data stream mining in
large-scale networks.(K3)
4. Analyze clustering methods and frequent itemset mining techniques for handling
large-scale and streaming datasets.(K4)
5. Understand content-based recommendation systems and contextual-bandit
approaches for personalization.(K2)
6. Analyze the efficiency, scalability, and computational cost of different applications
and platforms.(K4)
CO-PO, PSO MAPPING :
PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO1 PO11 PO12 PSO1 PSO
0 2
CO1 3 - 3 1 - - - - 2 1 3 - 3 2
CO2 3 - 2 3 - - 1 - - - 2 - 2 1
CO3 3 - 3 3 - - - - - - 3 - 3 1
CO4 3 - 3 3 - - - - - - 3 - 2 2
CO5 3 - 3 3 - - - - - - 3 - 3 2
CO6 3 - 3 3 - - - - - - 3 - 3 2