10000 Change some plots, more edits based on tech reviews, etc · codingbooks/tidy-text-mining@dd513e9 · GitHub
[go: up one dir, main page]

Skip to content

Commit dd513e9

Browse files
committed
Change some plots, more edits based on tech reviews, etc
1 parent 92b3356 commit dd513e9

File tree

3 files changed

+21
-17
lines changed

3 files changed

+21
-17
lines changed

03-tf-idf.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -251,7 +251,7 @@ physics %>%
251251
select(text)
252252
```
253253

254-
Maybe it makes sense to keep this one. Also notice that in this line we have "co-ordinate", which explains why there are separate "co" and "ordinate" items in the high tf-idf words for the Einstein text; the `unnest_tokens()` function separates around punctuation.
254+
Maybe it makes sense to keep this one. Also notice that in this line we have "co-ordinate", which explains why there are separate "co" and "ordinate" items in the high tf-idf words for the Einstein text; the `unnest_tokens()` function separates around punctuation. Notice that the tf-idf scores for "co" and "ordinate" are close to same!
255255

256256
"AB", "RC", and so forth are names of rays, circles, angles, and so forth for Huygens.
257257

07-tweet-archives.Rmd

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ One type of text that gets plenty of attention is text shared online via Twitter
1414

1515
An individual can download their own Twitter archive by following [directions available on Twitter's website](https://support.twitter.com/articles/20170160). We each downloaded ours and will now open them up. Let's use the lubridate package to convert the string timestamps to date-time objects and initially take a look at our tweeting patterns overall (Figure \@ref(fig:setup)).
1616

17-
```{r setup, fig.width=7, fig.height=6, fig.cap="All tweets from our accounts"}
17+
```{r setup, fig.width=7, fig.height=7, fig.cap="All tweets from our accounts"}
1818
library(lubridate)
1919
library(ggplot2)
2020
library(dplyr)
@@ -129,8 +129,7 @@ word_ratios <- tidy_tweets %>%
129129
count(word, person) %>%
130130
filter(sum(n) >= 10) %>%
131131
spread(person, n, fill = 0) %>%
132-
ungroup() %>%
133-
mutate_each(funs((. + 1) / sum(. + 1)), -word) %>%
132+
mutate_if(is.numeric, funs((. + 1) / sum(. + 1))) %>%
134133
mutate(logratio = log(David / Julia)) %>%
135134
arrange(desc(logratio))
136135
```
@@ -148,8 +147,9 @@ Which words are most likely to be from Julia's account or from David's account?
148147

149148
```{r plotratios, dependson = "word_ratios", fig.width=7, fig.height=6, fig.cap="Comparing the odds ratios of words from our accounts"}
150149
word_ratios %>%
150+
mutate(abslogratio = abs(logratio)) %>%
151151
group_by(logratio < 0) %>%
152-
top_n(15, abs(logratio)) %>%
152+
top_n(15, abslogratio) %>%
153153
ungroup() %>%
154154
mutate(word = reorder(word, logratio)) %>%
155155
ggplot(aes(word, logratio, fill = logratio < 0)) +

08-nasa-metadata.Rmd

Lines changed: 16 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -195,9 +195,10 @@ title_word_pairs %>%
195195
filter(n >= 250) %>%
196196
graph_from_data_frame() %>%
197197
ggraph(layout = "fr") +
198-
geom_edge_link(aes(edge_alpha = n, edge_width = n)) +
199-
geom_node_point(color = "darkslategray4", size = 5) +
200-
geom_node_text(aes(label = name), repel = TRUE) +
198+
geom_edge_link(aes(edge_alpha = n, edge_width = n), edge_colour = "cyan4") +
199+
geom_node_point(size = 5) +
200+
geom_node_text(aes(label = name), repel = TRUE,
201+
point.padding = unit(0.2, "lines")) +
201202
theme_void()
202203
```
203204

@@ -211,9 +212,10 @@ desc_word_pairs %>%
211212
filter(n >= 5000) %>%
212213
graph_from_data_frame() %>%
213214
ggraph(layout = "fr") +
214-
geom_edge_link(aes(edge_alpha = n, edge_width = n)) +
215-
geom_node_point(color = "indianred4", size = 5) +
216-
geom_node_text(aes(label = name), repel = TRUE) +
215+
geom_edge_link(aes(edge_alpha = n, edge_width = n), edge_colour = "darkred") +
216+
geom_node_point(size = 5) +
217+
geom_node_text(aes(label = name), repel = TRUE,
218+
point.padding = unit(0.2, "lines")) +
217219
theme_void()
218220
219221
```
@@ -235,9 +237,10 @@ keyword_pairs %>%
235237
filter(n >= 700) %>%
236238
graph_from_data_frame() %>%
237239
ggraph(layout = "fr") +
238-
geom_edge_link(aes(edge_alpha = n, edge_width = n)) +
239-
geom_node_point(color = "royalblue3", size = 5) +
240-
geom_node_text(aes(label = name), repel = TRUE) +
240+
geom_edge_link(aes(edge_alpha = n, edge_width = n), edge_colour = "royalblue") +
241+
geom_node_point(size = 5) +
242+
geom_node_text(aes(label = name), repel = TRUE,
243+
point.padding = unit(0.2, "lines")) +
241244
theme_void()
242245
```
243246

@@ -268,9 +271,10 @@ keyword_cors %>%
268271
filter(correlation > .6) %>%
269272
graph_from_data_frame() %>%
270273
ggraph(layout = "fr") +
271-
geom_edge_link(aes(edge_alpha = correlation, edge_width = correlation)) +
272-
geom_node_point(color = "royalblue3", size = 5) +
273-
geom_node_text(aes(label = name), repel = TRUE) +
274+
geom_edge_link(aes(edge_alpha = correlation, edge_width = correlation), edge_colour = "royalblue") +
275+
geom_node_point(size = 5) +
276+
geom_node_text(aes(label = name), repel = TRUE,
277+
point.padding = unit(0.2, "lines")) +
274278
theme_void()
275279
```
276280

0 commit comments

Comments
 (0)
0