You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 05-document-term-matrices.Rmd
+4-3Lines changed: 4 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -310,7 +310,8 @@ Each of the items in the `corpus` list column is a `WebCorpus` object, which is
310
310
311
311
```{r stock_tokens, dependson = "stock_articles"}
312
312
stock_tokens <- stock_articles %>%
313
-
unnest(map(corpus, tidy)) %>%
313
+
mutate(corpus = map(corpus, tidy)) %>%
314
+
unnest(cols = (corpus)) %>%
314
315
unnest_tokens(word, text) %>%
315
316
select(company, datetimestamp, word, id, heading)
316
317
@@ -319,7 +320,7 @@ stock_tokens
319
320
320
321
Here we see some of each article's metadata alongside the words used. We could use tf-idf to determine which words were most specific to each stock symbol.
The top terms for each are visualized in Figure \@ref(fig:stocktfidf). As we'd expect, the company's name and symbol are typically included, but so are several of their product offerings and executives, as well as companies they are making deals with (such as Disney with Netflix).
333
334
334
-
```{r stocktfidf, dependson = "stock_tf_idf", echo = FALSE, fig.cap = "The 8 words with the highest tf-idf in recent articles specific to each company", fig.height = 8, fig.width = 8}
335
+
```{r stocktfidf, dependson = "stocktfidfdata", echo = FALSE, fig.cap = "The 8 words with the highest tf-idf in recent articles specific to each company", fig.height = 8, fig.width = 8}
0 commit comments