Creating a word cloud on R-bloggers posts
This post will go through how to create a word cloud of article titles scraped from the awesome R-bloggers. Our goal will be to use R's rvest package to search through 50 successive pages on the site for article titles. The stringr and tm packages will be used for string cleaning and for creating a term document frequency matrix (with tm). We will then create a word cloud based off the words comprising these titles. First, we'll load the packages we need. [code lang="R"] # load packages library(rvest) library(stringr) library(tm) library(wordcloud) [/code] Let's write a function that will take a webpage as input and return all the scraped article titles. [code lang="R"] scrape_post_titles <- function(site) { # scrape HTML from input site source_html <- read_html(site) # grab the title attributes…