0% found this document useful (0 votes)

20 views51 pages

18-Sub-Modular Functions

Sub modular functions

Uploaded by

blessedmabvunure

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views51 pages

18-Sub-Modular Functions

Sub modular functions

Uploaded by

blessedmabvunure

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.

edu 1
 Date:
▪ Monday, March 11 2:00 PM –
Wednesday, March 13, 2:00 PM Pacific Time
▪ Logistics:
▪ Administered on Gradescope
▪ 3 hours long (timer starts once you open the exam)
▪ Submitting answers (all questions visible at the same time):
 One PDF for the entire exam (uploaded at the top of the exam)
 One PDF for each question (uploaded to each question)
▪ You can do this as you go through the questions (do not need to
wait until the end)
 Write answers directly in text boxes
▪ Please budget your time for submission (~10 min) and solve
questions you find easy first – the exam tends to be on the
longer side

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 2
 If you think a question isn't clear on the
exam...
▪ Ask on Ed or state your (reasonable
and valid) assumptions in your answer
▪ We will actively monitor Ed on...
▪ Monday: 2 PM – 10 PM PT
▪ Tuesday: 8 AM – 3 PM, 5 PM – 10 PM PT
▪ Wednesday: 8 AM – 2 PM PT
▪ We will answer clarifying questions only
 Exam Review Session: Friday, 6 PM – 7 PM PT
via Zoom (see Ed, Canvas for details)
3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 3
 Final exam is open book and open notes
 A calculator or computer is REQUIRED
▪ You may only use your computer to do arithmetic
calculations (i.e., the buttons found on a standard
scientific calculator)
▪ You may also use your computer to read course
notes or the textbook
▪ No use of AI chatbots (including, but not limited
to, ChatGPT)
▪ No collaboration with other students
 Practice finals are posted on Ed, Gradescope
3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 4
Good luck with the exam! ☺
 You Have Done a Lot!!!
 And (hopefully) learned a lot!!!
▪ Answered questions and
proved many interesting results
▪ Implemented a number of methods

Thank You for the

Hard Work!
3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 5
Note to other teachers and users of these slides: We would be delighted if you found our
material useful for giving your own lectures. Feel free to use these slides verbatim, or to
modify them to fit your own needs. If you make use of a significant portion of these slides
in your own lecture, please include this message, or a link to our web site: http://www.mmds.org

CS246: Mining Massive Datasets

Jure Leskovec, Stanford University
http://cs246.stanford.edu
 Redundancy leads to a bad user experience

▪ Uncertainty around information need => don’t

put all eggs in one basket
 How do we optimize for diversity directly?
3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 7
France intervenes

Chuck for Defense

Argo wins big

Hagel expects fight
Monday, January 14
3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 8
France intervenes

Chuck for Defense

Argo wins big

New gun proposals
Monday, January 14
3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 9
 Idea: Encode diversity as coverage problem
 Example: Word cloud of news for a single day
▪ Want to select articles so that most words are
“covered”

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 10
 Q: What is being covered?
 A: Concepts (In our case: Named entities)
France Mali Hagel Pentagon Obama Romney Zero Dark Thirty Argo NFL

Hagel expects fight

 Q: Who is doing the covering?

 A: Documents
3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 12
 Suppose we are given a set of documents D
▪ Each document d covers a set 𝑿𝒅 of
words/topics/named entities W
 For a set of documents A  D we define

𝑭 𝑨 = ራ 𝑿𝒊
𝒊∈𝑨
 Goal: We want to
max 𝑭(𝑨)
𝑨 ≤𝒌
 Note: F(A) is a set function: 𝑭 𝑨 : 𝐒𝐞𝐭𝐬 → ℕ
3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 13
 Given universe of elements 𝑾 = {𝒘𝟏, … , 𝒘𝒏 }
and sets 𝑿𝟏, … , 𝑿𝒎  𝑾
X3

X1 W
X2 X4

 Goal: Find k sets Xi that cover the most of W

▪ More precisely: Find k sets Xi whose size of the
union is the largest
▪ Bad news: A known NP-complete problem
3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 14
Simple Heuristic: Greedy Algorithm:
 Start with 𝑨𝟎 = { }
 For 𝒊 = 𝟏 … 𝒌
▪ Find set 𝒅 that 𝐦𝐚𝐱 𝑭(𝑨𝒊−𝟏 ∪ {𝒅})
▪ Let 𝑨𝒊 = 𝑨𝒊−𝟏  {𝒅}
𝑭 𝑨 = ራ 𝑿𝒅

Example:
𝒅∈𝑨

▪ Eval. 𝑭 𝒅𝟏 , … , 𝑭({𝒅𝒎}), pick best (say 𝒅𝟏 )
▪ Eval. 𝑭 𝒅𝟏 } ∪ {𝒅𝟐 , … , 𝑭({𝒅𝟏 } ∪ {𝒅𝒎 }), pick best (say 𝒅𝟐 )
▪ Eval. 𝑭({𝒅𝟏 , 𝒅𝟐 } ∪ {𝒅𝟑 }), … , 𝑭({𝒅𝟏 , 𝒅𝟐 } ∪ {𝒅𝒎}), pick best
▪ And so on…
3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 15
 Goal: Maximize the covered area

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 16
 Goal: Maximize the covered area

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 17
 Goal: Maximize the covered area

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 18
 Goal: Maximize the covered area

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 19
 Goal: Maximize the covered area

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 20
A

B C

 Goal: Maximize the size of the covered area

 Greedy first picks A and then C
 But the optimal way would be to pick B and C

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 21
 Greedy produces a solution A
where: F(A)  (1-1/e)*OPT (F(A)  0.63*OPT)
[Nemhauser, Fisher, Wolsey ’78]

 Claim holds for functions F(·) with 2 properties:

▪ F is monotone: (adding more docs doesn’t decrease coverage)
if A  B then F(A)  F(B) and F({})=0
▪ F is submodular:
adding an element to a set gives less improvement
than adding it to one of its subsets

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 22
Definition:
 Set function F(·) is called submodular if:
For all A,B W:
F(A) + F(B)  F(A B) + F(A B)

+  +
A A B B
A B

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 23
 Diminishing returns characterization
Equivalent definition:
 Set function F(·) is called submodular if:
For all A B:

F(A  {d}) – F(A) ≥ F(B  {d}) – F(B)

Gain of adding d to a small set Gain of adding d to a large set

B A + d Large improvement

+ d Small improvement

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 24
 F(·) is submodular: A  B
F(A  {d}) – F(A) ≥ F(B  {d}) – F(B)
Gain of adding d to a small set Gain of adding d to a large set

 Natural example: A
▪ Sets 𝑑1, … , 𝑑𝑚
d
▪ 𝐹 𝐴 = ‫𝑖𝑑 𝐴∈𝑖ڂ‬
(size of the covered area)
B
▪ Claim:
𝑭(𝑨) is submodular! d

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 25
 Submodularity is discrete analogue of
concavity
F(·)

F(B  {d}) A  B
F(B)
F(A  {d})

F(A) Adding d to B helps less

than adding it to A!

Solution size |A|

F(A  {d}) – F(A) ≥ F(B  {d}) – F(B)
Gain of adding Xd to a small set Gain of adding Xd to a large set
3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 26
 Marginal gain:
𝚫𝑭 𝒅 𝑨 = 𝑭 𝑨 ∪ {𝒅} − 𝑭(𝑨)
 Submodular: 𝐴⊆𝐵
𝑭 𝑨 ∪ {𝒅} − 𝑭 𝑨 ≥ 𝑭 𝑩 ∪ {𝒅} − 𝑭(𝑩)
 Concavity: 𝑎≤𝑏
𝒇 𝒂 + 𝒅 − 𝒇 𝒂 ≥ 𝒇 𝒃 + 𝒅 − 𝒇(𝒃)
F(A)

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu
|A| 27
 Let 𝑭𝟏 … 𝑭𝒎 be submodular and 𝝀𝟏 … 𝝀𝒎 > 𝟎
then 𝑭 𝑨 = σ𝒎 𝒊=𝟏 𝝀𝒊 𝑭𝒊 𝑨 is submodular
▪ Submodularity is closed under non-negative
linear combinations!

 This is an extremely useful fact:

▪ Average of submodular functions is submodular:
𝑭 𝑨 = σ𝒊 𝑷 𝒊 ⋅ 𝑭𝒊 𝑨
▪ Multicriterion optimization: 𝑭 𝑨 = σ𝒊 𝝀𝒊 𝑭𝒊 𝑨

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 28
 Q: What is being covered?
 A: Concepts (In our case: Named entities)
France Mali Hagel Pentagon Obama Romney Zero Dark Thirty Argo NFL

Hagel expects fight

 Q: Who is doing the covering?

 A: Documents
3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 29
 Objective: pick k docs that cover most concepts
France Mali Hagel Pentagon Obama Romney Zero Dark Thirty Argo NFL

Enthusiasm for Inauguration wanes Inauguration weekend

 F(A): the number of concepts covered by A
▪ Elements…concepts, Sets … concepts in docs
▪ F(A) is submodular and monotone!
▪ We can use greedy algorithm to optimize F
3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 30
 Objective: pick k docs that cover most concepts
France Mali Hagel Pentagon Obama Romney Zero Dark Thirty Argo NFL

Enthusiasm for Inauguration wanes Inauguration weekend

The good: The bad:

Penalizes redundancy Concept importance?
Submodular All-or-nothing too harsh
3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 31
 Objective: pick k docs that cover most concepts

France Mali Hagel Pentagon Obama Romney Zero Dark Thirty Argo NFL

Enthusiasm for Inauguration wanes Inauguration weekend

 Each concept 𝒄 has importance weight 𝒘𝒄

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 33
 Document coverage function
probability document d covers
concept c
[e.g., how strongly d covers c]

Obama Romney

Enthusiasm for Inauguration wanes

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 34
 Document coverage function:
probability document d covers
concept c
▪ Coverd(c) can also model how relevant is concept c for user u
 Set coverage function:

▪ Prob. that at least one document in A covers c

 Objective: concept weights

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 35
 The objective function is also submodular
▪ Intuitively, it has a diminishing returns property
▪ Greedy algorithm leads to a (1 – 1/e) ~ 63%
approximation, i.e., a near-optimal solution

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 36
 Objective: pick k docs that cover most concepts

France Mali Hagel Pentagon Obama Romney Zero Dark Thirty Argo NFL

Enthusiasm for Inauguration wanes Inauguration weekend

 Each concept 𝑐 has importance weight 𝑤𝑐
 Documents partially cover concepts: 𝐜𝐨𝐯𝐞𝐫𝒅 (𝒄)
3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 37
Greedy  Greedy algorithm is slow!
Marginal gain: ▪ At each iteration we need to
F(Ax)-F(A)
re-evaluate marginal gains of
a
all remaining documents
b
▪ Runtime 𝑶(|𝑫| · 𝑲) for
c selecting 𝑲 documents out of the
d
set of 𝑫 of them

Add document with

highest marginal gain

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 39
[Leskovec et al., KDD ’07]

 In round 𝒊: So far we have 𝑨𝒊−𝟏 = {𝒅𝟏 , … , 𝒅𝒊−𝟏 }

▪ Now we pick 𝐝𝒊 = 𝐚𝐫𝐠 𝐦𝐚𝐱 𝑭(𝑨𝒊−𝟏 ∪ {𝒅}) − 𝑭(𝑨𝒊−𝟏)
𝒅∈𝑽
▪ Greedy algorithm maximizes the “marginal benefit”
𝚫 𝒊 𝒅 = 𝑭(𝑨𝒊−𝟏 ∪ {𝒅}) − 𝑭(𝑨𝒊−𝟏 )

 By submodularity property:
𝐹 𝐴𝑖 ∪ 𝑑 − 𝐹 𝐴𝑖 ≥ 𝐹 𝐴𝑗 ∪ 𝑑 − 𝐹 𝐴𝑗 for 𝑖 < 𝑗
 Observation: By submodularity:
For every 𝒅 ∈ 𝑫
𝚫𝒊 (𝒅) ≥ 𝚫𝒋 (𝒅) for 𝒊 < 𝒋 since 𝑨𝒊  𝑨𝒋  i(d)   j(d)
 Marginal benefits 𝚫𝒊 (𝒅) only shrink! d
(as i grows) Selecting document d in step i covers
more words than selecting d at step j (j>i)
3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 40
[Leskovec et al., KDD ’07]

 Idea:
(Upper bound on)
▪ Use  i as upper-bound on  j (j > i) Marginal gain  1
 Lazy Greedy: a A1={a}

▪ Keep an ordered list of marginal b

benefits  i from previous iteration c
▪ Re-evaluate  i only for top d
element
e
▪ Re-sort and prune

F(A  {d}) – F(A) ≥ F(B  {d}) – F(B) A B

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 41
[Leskovec et al., KDD ’07]

 Idea:
Upper bound on
▪ Use  i as upper-bound on  j (j > i) Marginal gain  2
 Lazy Greedy: a A1={a}

▪ Keep an ordered list of marginal b

benefits  i from previous iteration c
▪ Re-evaluate  i only for top d
element
e
▪ Re-sort and prune

F(A  {d}) – F(A) ≥ F(B  {d}) – F(B) A B

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 42
[Leskovec et al., KDD ’07]

 Idea:
Upper bound on
▪ Use  i as upper-bound on  j (j > i) Marginal gain  2
 Lazy Greedy: a A1={a}

▪ Keep an ordered list of marginal d A2={a,b}

benefits  i from previous iteration b
▪ Re-evaluate  i only for top element e
▪ Re-sort and prune c

F(A  {d}) – F(A) ≥ F(B  {d}) – F(B) A B

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 43
 Summary so far:
▪ Diversity can be formulated as a set cover
▪ Set cover is submodular optimization problem
▪ Can be (approximately) solved using greedy algorithm
▪ Lazy-greedy gives significant speedup
400

exhaustive search
300 (all subsets)
Lower is better

running time (seconds)

naive
200 greedy

100
Lazy

0
1 2 3 4 5 6 7 8 9 10
number of blogs selected
3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 44
But what about
personalization?

Election trouble

model Songs of Syria

Sandy delays

Recommendations

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 45
We assumed same concept weighting for all users

France Mali Hagel Pentagon Obama Romney Zero Dark Thirty Argo NFL

France intervenes

Chuck for Defense

Argo wins big

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 46
 Each user has different preferences over
concepts

France Mali Hagel Pentagon Obama Romney Zero Dark Thirty Argo NFL

politico

France Mali Hagel Pentagon Obama Romney Zero Dark Thirty Argo NFL

movie buff
3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 47
 Assume each user u has different preference
vector wc(u) over concepts c

 Goal: Learn personal concept weights from

user feedback

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 48
France Mali Hagel Pentagon Obama Romney Zero Dark Thirty Argo NFL

France intervenes
Chuck for Defense
Argo wins big

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 49
 Multiplicative Weights algorithm
▪ Assume each concept 𝒄 has weight 𝒘𝒄
▪ We recommend document 𝒅 and receive feedback,
say 𝒓 = +1 or -1
▪ Update the weights:
▪ For each 𝒄 ∈ 𝑿𝒅 set 𝒘𝒄 = 𝜷𝒓𝒘𝒄
▪ If concept c appears in doc d and we received positive feedback r=+1
then we increase the weight wc by multiplying it by 𝜷 (𝜷 > 𝟏)
otherwise we decrease the weight (divide by 𝜷)
▪ Normalize weights so that σ𝒄 𝒘𝒄 = 𝟏

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 50
 Steps of the algorithm:
1. Identify items to recommend from
2. Identify concepts [what makes items redundant?]
3. Weigh concepts by general importance
4. Define item-concept coverage function
5. Select items using probabilistic set cover
6. Obtain feedback, update weights

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.edu 51

19 Submodular
No ratings yet
19 Submodular
47 pages
01 Mapreduce
No ratings yet
01 Mapreduce
77 pages
Stanford - Slides Mapreduce
No ratings yet
Stanford - Slides Mapreduce
76 pages
18 Advertising
No ratings yet
18 Advertising
48 pages
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
No ratings yet
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
56 pages
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
No ratings yet
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
42 pages
01 Intro
No ratings yet
01 Intro
70 pages
01 Intro PDF
No ratings yet
01 Intro PDF
69 pages
19 Bandits
No ratings yet
19 Bandits
48 pages
Introduction PDF
No ratings yet
Introduction PDF
69 pages
16 Streams
No ratings yet
16 Streams
5 pages
16 Streams
No ratings yet
16 Streams
61 pages
Jeffrey D. Ullman: Stanford University
No ratings yet
Jeffrey D. Ullman: Stanford University
52 pages
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
No ratings yet
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
53 pages
Data Mining Course Overview
No ratings yet
Data Mining Course Overview
29 pages
Mmds Exam 2022
No ratings yet
Mmds Exam 2022
17 pages
ch01 Intro
No ratings yet
ch01 Intro
28 pages
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
No ratings yet
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
49 pages
07 Recsys1
No ratings yet
07 Recsys1
48 pages
17-Matrix Sketching
No ratings yet
17-Matrix Sketching
65 pages
Data Mining for Advanced Learners
No ratings yet
Data Mining for Advanced Learners
58 pages
09 Pagerank
No ratings yet
09 Pagerank
61 pages
Computational Complexity Theory: P (Polynomial Time)
No ratings yet
Computational Complexity Theory: P (Polynomial Time)
11 pages
Big Data Analytics Course Introduction
No ratings yet
Big Data Analytics Course Introduction
28 pages
02 Assocrules
No ratings yet
02 Assocrules
56 pages
07 Clustering
No ratings yet
07 Clustering
44 pages
08 Recsys2
No ratings yet
08 Recsys2
60 pages
Mining Massive Datasets Preface
No ratings yet
Mining Massive Datasets Preface
17 pages
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman
No ratings yet
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman
64 pages
Introduction Algorithm
No ratings yet
Introduction Algorithm
46 pages
Cmu850 f20
No ratings yet
Cmu850 f20
309 pages
Theory of Locality Sensitive Hashing - CS246 Stanford (Slides)
No ratings yet
Theory of Locality Sensitive Hashing - CS246 Stanford (Slides)
52 pages
Mining of Massive Datasets: Jure Leskovec Anand Rajaraman Jeffrey D. Ullman
0% (1)
Mining of Massive Datasets: Jure Leskovec Anand Rajaraman Jeffrey D. Ullman
17 pages
02 Algorithm
No ratings yet
02 Algorithm
22 pages
AAscript
No ratings yet
AAscript
158 pages
07 Recsys1
No ratings yet
07 Recsys1
47 pages
Association Rules
No ratings yet
Association Rules
56 pages
Unit 4
No ratings yet
Unit 4
60 pages
Cmu850 f20
No ratings yet
Cmu850 f20
285 pages
Understanding NP-Completeness
No ratings yet
Understanding NP-Completeness
32 pages
Algorithms
No ratings yet
Algorithms
501 pages
Algorithm Design & Intractability
No ratings yet
Algorithm Design & Intractability
32 pages
Community Detection in Social Networks
No ratings yet
Community Detection in Social Networks
64 pages
Da 2023
No ratings yet
Da 2023
30 pages
18-Complexity Theory
No ratings yet
18-Complexity Theory
23 pages
(Ebook) Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Je Rey D. Ullman ISBN 9781107077232, 1107077230 Ready To Read
No ratings yet
(Ebook) Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Je Rey D. Ullman ISBN 9781107077232, 1107077230 Ready To Read
103 pages
Brief Historical Context of NP Complete
No ratings yet
Brief Historical Context of NP Complete
10 pages
Toc CS246 PRK
No ratings yet
Toc CS246 PRK
17 pages
DAA Unit - 4
No ratings yet
DAA Unit - 4
45 pages
Cs Theorists Toolkit
No ratings yet
Cs Theorists Toolkit
95 pages
Bloom Filters & Stream Algorithms
No ratings yet
Bloom Filters & Stream Algorithms
4 pages
BD - Lecture 3 - Decision Tree
No ratings yet
BD - Lecture 3 - Decision Tree
39 pages
PB Algo2 2013 PDF
No ratings yet
PB Algo2 2013 PDF
87 pages
Mining Data Streams (Part 2) : Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman
No ratings yet
Mining Data Streams (Part 2) : Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman
46 pages
ch04 Streams1
No ratings yet
ch04 Streams1
4 pages
Design and Analysis of Algorithms
No ratings yet
Design and Analysis of Algorithms
124 pages
Response To AUT-1
No ratings yet
Response To AUT-1
3 pages
Hasts206 Sampling Theory Survey Techniques and Demography Course Outline - 114514
No ratings yet
Hasts206 Sampling Theory Survey Techniques and Demography Course Outline - 114514
3 pages
b4 055739
No ratings yet
b4 055739
2 pages
b2 055632
No ratings yet
b2 055632
2 pages
CHAP1
No ratings yet
CHAP1
3 pages
Main Test Corrected
No ratings yet
Main Test Corrected
1 page
0 Main
No ratings yet
0 Main
13 pages
Ecd101 Final Exam Oct 2024
100% (1)
Ecd101 Final Exam Oct 2024
10 pages
Calculus T4 (DFM)
No ratings yet
Calculus T4 (DFM)
2 pages
Accn101 Assignment Final
No ratings yet
Accn101 Assignment Final
36 pages
HMTHCS 101tutorial 5
No ratings yet
HMTHCS 101tutorial 5
2 pages
HMTHCS 101tutorial 4
No ratings yet
HMTHCS 101tutorial 4
2 pages
Unit 3 Information Sources
No ratings yet
Unit 3 Information Sources
15 pages
Gradient Boosting
No ratings yet
Gradient Boosting
39 pages
Department of Electrical Engineering: Semester:4 Subject: Signals and Systems Lab. No. 04 Objective
No ratings yet
Department of Electrical Engineering: Semester:4 Subject: Signals and Systems Lab. No. 04 Objective
5 pages
Finite Difference Method Guide
No ratings yet
Finite Difference Method Guide
11 pages
Digital Control Systems Chapter 2
No ratings yet
Digital Control Systems Chapter 2
28 pages
ARX Model Identification Using Generalized Spectra
No ratings yet
ARX Model Identification Using Generalized Spectra
9 pages
Highly Accurate MFD Using Deep Transfer Learning
No ratings yet
Highly Accurate MFD Using Deep Transfer Learning
2 pages
MTH601-MidTerm-solved MCQ Mega File 2
No ratings yet
MTH601-MidTerm-solved MCQ Mega File 2
14 pages
ORB: An Efficient Alternative To SIFT or SURF: Conference Paper
No ratings yet
ORB: An Efficient Alternative To SIFT or SURF: Conference Paper
9 pages
Advanced Animation and Rendering Techniques
No ratings yet
Advanced Animation and Rendering Techniques
5 pages
1-Root Finding-Open Methods
No ratings yet
1-Root Finding-Open Methods
44 pages
Ada Lab File
No ratings yet
Ada Lab File
26 pages
Adaq 1
No ratings yet
Adaq 1
2 pages
Data Structure and Algorithm MCQ: A) B) C) D)
No ratings yet
Data Structure and Algorithm MCQ: A) B) C) D)
12 pages
DL Exp-1 16010422230
No ratings yet
DL Exp-1 16010422230
6 pages
Operational Research: Assignment Problems
No ratings yet
Operational Research: Assignment Problems
32 pages
Data Mining Classification Methods
No ratings yet
Data Mining Classification Methods
24 pages
Dynamic Programming
No ratings yet
Dynamic Programming
36 pages
Five Years Ago, John's Age Was Half of The Age He Will Be in 8 Years. How Old Is He Now?
No ratings yet
Five Years Ago, John's Age Was Half of The Age He Will Be in 8 Years. How Old Is He Now?
8 pages
2 Candidate Elimination Alg
No ratings yet
2 Candidate Elimination Alg
3 pages
6 BSTs and AVL Trees
No ratings yet
6 BSTs and AVL Trees
12 pages
Lab 2 Regression Analysis Solved
No ratings yet
Lab 2 Regression Analysis Solved
6 pages
Upskills Week - 2
No ratings yet
Upskills Week - 2
4 pages
Pipelining (DSP Implementation) - Wikipedia, The Free Encyclopedia
No ratings yet
Pipelining (DSP Implementation) - Wikipedia, The Free Encyclopedia
5 pages
225C4A
No ratings yet
225C4A
2 pages
JavaScript Algorithms and Data Structures - Comprehensive Guide
No ratings yet
JavaScript Algorithms and Data Structures - Comprehensive Guide
9 pages
Ada Theory
No ratings yet
Ada Theory
10 pages
Motion Segmentation Techniques Explained
100% (1)
Motion Segmentation Techniques Explained
18 pages
MATLAB TDM Multiplexing Guide
50% (2)
MATLAB TDM Multiplexing Guide
13 pages
Estimation of The Mathematical Parameters of Double-Exponential Pulses Using The NelderMead Algorithm
No ratings yet
Estimation of The Mathematical Parameters of Double-Exponential Pulses Using The NelderMead Algorithm
3 pages
Biometric Recognition Based On Fingerprint (2017)
No ratings yet
Biometric Recognition Based On Fingerprint (2017)
6 pages

18-Sub-Modular Functions

Uploaded by

18-Sub-Modular Functions

Uploaded by

3/7/2024 Jure Les kovec, Stanford CS246: Mi ning Ma ssive Datasets, http://cs246.stanford.

Thank You for the

CS246: Mining Massive Datasets

▪ Uncertainty around information need => don’t

Chuck for Defense

Argo wins big

Chuck for Defense

Argo wins big

Hagel expects fight

 Q: Who is doing the covering?

 Goal: Find k sets Xi that cover the most of W

 Goal: Maximize the size of the covered area

 Claim holds for functions F(·) with 2 properties:

F(A  {d}) – F(A) ≥ F(B  {d}) – F(B)

F(A) Adding d to B helps less

Solution size |A|

 This is an extremely useful fact:

Hagel expects fight

 Q: Who is doing the covering?

Enthusiasm for Inauguration wanes Inauguration weekend

Enthusiasm for Inauguration wanes Inauguration weekend

The good: The bad:

Enthusiasm for Inauguration wanes Inauguration weekend

 Each concept 𝒄 has importance weight 𝒘𝒄

Enthusiasm for Inauguration wanes

▪ Prob. that at least one document in A covers c

Enthusiasm for Inauguration wanes Inauguration weekend

Add document with

 In round 𝒊: So far we have 𝑨𝒊−𝟏 = {𝒅𝟏 , … , 𝒅𝒊−𝟏 }

▪ Keep an ordered list of marginal b

F(A  {d}) – F(A) ≥ F(B  {d}) – F(B) A B

▪ Keep an ordered list of marginal b

F(A  {d}) – F(A) ≥ F(B  {d}) – F(B) A B

▪ Keep an ordered list of marginal d A2={a,b}

F(A  {d}) – F(A) ≥ F(B  {d}) – F(B) A B

running time (seconds)

model Songs of Syria

Chuck for Defense

Argo wins big

 Goal: Learn personal concept weights from

You might also like