codingbooks
diff --git a/‎04-word-combinations.Rmd
Lines changed: 4 additions & 4 deletions b/‎04-word-combinations.Rmd
Lines changed: 4 additions & 4 deletions
diff --git a/‎05-document-term-matrices.Rmd
Lines changed: 3 additions & 3 deletions b/‎05-document-term-matrices.Rmd
Lines changed: 3 additions & 3 deletions
diff --git a/‎06-topic-models.Rmd
Lines changed: 1 addition & 1 deletion b/‎06-topic-models.Rmd
Lines changed: 1 addition & 1 deletion
diff --git a/‎09-usenet.Rmd
Lines changed: 3 additions & 3 deletions b/‎09-usenet.Rmd
Lines changed: 3 additions & 3 deletions
@@ -31,7 +31,7 @@ austen_bigrams
 
 This data structure is still a variation of the tidy text format. It is structured as one-token-per-row (with extra metadata, such as `book`, still preserved), but each token now represents a bigram.
 
-```{block, type = "rmdnote"}
+```NOTE
 Notice that these bigrams overlap: "sense and" is one token, while "and sensibility" is another.
 ```
 
@@ -273,15 +273,15 @@ ggraph(bigram_graph, layout = "fr") +
 
 It may take a some experimentation with ggraph to get your networks into a presentable format like this, but the network structure is useful and flexible way to visualize relational tidy data.
 
-```{block, type = "rmdnote"}
+```NOTE
 Note that this is a visualization of a **Markov chain**, a common model in text processing. In a Markov chain, each choice of word depends only on the previous word. In this case, a random generator following this model might spit out "dear", then "sir", then "william/walter/thomas/thomas's", by following each word to the most common words that follow it. To make the visualization interpretable, we chose to show only the most common word to word connections, but one could imagine an enormous graph representing all connections that occur in the text.
 ```
 
 ### Visualizing bigrams in other texts
 
 We went to a good amount of work in cleaning and visualizing bigrams on a text dataset, so let's collect it into a function so that we easily perform it on other text datasets.
 
-```{block, type = "rmdnote"}
+```NOTE
 To make it easy to use the `count_bigrams()` and `visualize_bigrams()` yourself, we've also reloaded the packages necessary for them. 
 ```
 
@@ -410,7 +410,7 @@ For example, that $n_{11}$ represents the number of documents where both word X
 
 $$\phi=\frac{n_{11}n_{00}-n_{10}n_{01}}{\sqrt{n_{1\cdot}n_{0\cdot}n_{\cdot0}n_{\cdot1}}}$$
 
-```{block, type = "rmdnote"}
+```NOTE
 The phi coefficient is equivalent to the Pearson correlation, which you may have heard of elsewhere, when it is applied to binary data).
 ```
 
 
@@ -69,7 +69,7 @@ ap_td
 
 Notice that we now have a tidy three-column `tbl_df`, with variables `document`, `term`, and `count`. This tidying operation is similar to the `melt()` function from the reshape2 package [@R-reshape2] for non-sparse matrices.
 
-```{block, type = "rmdnote"}
+```NOTE
 Notice that only the non-zero values are included in the tidied output: document 1 includes terms such as "adding" and "adult", but not "aaron" or "abandon". This means the tidied version has no rows where `count` is zero.
 ```
 
 
 As another example of a visualization possible with tidy data, we could extract the year from each document's name, and compute the total number of words within each year.
 
-```{block, type = "rmdnote"}
+```NOTE
 Note that we've used tidyr's `complete()` function to include zeroes (cases where a word didn't appear in a document) in the table.
 ```
 
@@ -276,7 +276,7 @@ acq_tokens %>%
 
 Here we'll retrieve recent articles relevant to nine major technology stocks: Microsoft, Apple, Google, Amazon, Facebook, Twitter, IBM, Yahoo, and Netflix.
 
-```{block, type = "rmdnote"}
+```NOTE
 These results were downloaded in January 2017, when this chapter was written, but you'll certainly find different results if you ran it for yourself. Note that this code takes several minutes to run.
 ```
 
 
@@ -40,7 +40,7 @@ AssociatedPress
 
 We can use the `LDA()` function from the topicmodels package, setting `k = 2`, to create a two-topic LDA model.
 
-```{block, type = "rmdnote"}
+```NOTE
 Almost any topic model in practice will use a larger `k`, but we will soon see that this analysis approach extends to a larger number of topics.
 ```
 
 
@@ -15,7 +15,7 @@ In our final chapter, we'll use what we've learned in this book to perform a sta
 
 We'll start by reading in all the messages from the `20news-bydate` folder, which are organized in sub-folders with one file for each message. We can read in files like these with a combination of `read_lines()`, `map()` and `unnest()`.
 
-```{block, type = "rmdwarning"}
+```WARNING
 Note that this step may take several minutes to read all the documents.
 ```
 
@@ -86,7 +86,7 @@ cleaned_text <- raw_text %>%
 
 Many lines also have nested text representing quotes from other users, typically starting with a line like "so-and-so writes..." These can be removed with a few regular expressions.
 
-```{block, type = "rmdnote"}
+```NOTE
 We also choose to manually remove two messages, `9704` and `9985` that contained a large amount of non-text content.
 ```
 
@@ -361,7 +361,7 @@ sentiment_messages <- usenet_words %>%
   filter(words >= 5)
 ```
 
-```{block, type = "rmdnote"}
+```NOTE
 As a simple measure to reduce the role of randomness, we filtered out messages that had fewer than five words that contributed to sentiment.
 ```