BeautifulSoup vs. Rvest

BeautifulSoup vs. Rvest

Python, R, Web Scraping
This post will compare Python's BeautifulSoup package to R's rvest package for web scraping. We'll also talk about additional functionality in rvest (that doesn't exist in BeautifulSoup) in comparison to a couple of other Python packages (including pandas and RoboBrowser). Getting started BeautifulSoup and rvest both involve creating an object that we can use to parse the HTML from a webpage. However, one immediate difference is that BeautifulSoup is just a web parser, so it doesn't connect to webpages. rvest, on the other hand, can connect to a webpage and scrape / parse its HTML in a single package. In BeautifulSoup, our initial setup looks like this: [code lang="python"] # load packages from bs4 import BeautifulSoup import requests # connect to webpage resp = requests.get(""https://www.azlyrics.com/b/beatles.html"") # get BeautifulSoup object soup…
Read More
Testing the Collatz Conjecture with R

Testing the Collatz Conjecture with R

R
Background The Collatz Conjecture is a famous unsolved problem in number theory. If you're not familiar with it - the conjecture is very simple to understand, yet, no one has been able to mathematically prove that the conjecture is true (though it's been shown to be true for an enormous number of cases). The conjecture states the following: Start with any whole number. If the number is even, divide by two. If the number is odd, multiply the number by three and add one. Then, repeat this logic with the result number. Eventually you'll end up with the number one. Effectively, this is can written like this: For any whole number n: If n mod 2 == 0, then n = n / 2 Else n = 3 * n…
Read More