Web Browsing and Parsing with RoboBrowser and requests_html

Web Browsing and Parsing with RoboBrowser and requests_html

Python, Web Scraping
Background So you've learned all about BeautifulSoup. What's next? Python is a great language for automating web operations. In a previous article we went through how to use BeautifulSoup and requests to scrape stock-related articles from Nasdaq's website. This post talks about a couple of alternatives to using BeautifulSoup directly. One way of scraping and crawling the web is to use Python's RoboBrowser package, which is built on top of requests and BeautifulSoup. Because it's built using each of these packages, writing code to scrape the web is a bit simplified as we'll see below. RoboBrowser works similarly to the older Python 2.x package mechanize in that it allows you to simulate a web browser. A second option is using requests_html, which was also discussed here, and which we'll also…
Read More
Scraping data from a JavaScript webpage with Python

Scraping data from a JavaScript webpage with Python

Python, Web Scraping
This post will walk through how to use the requests_html package to scrape options data from a JavaScript-rendered webpage. requests_html serves as an alternative to Selenium and PhantomJS, and provides a clear syntax similar to the awesome requests package. The code we'll walk through is packaged into functions in the options module in the yahoo_fin package, but this article will show how to write the code from scratch using requests_html so that you can use the same idea to scrape other JavaScript-rendered webpages. Note: requests_html requires Python 3.6+. If you don't have requests_html installed, you can download it using pip: [cc] pip install requests_html [/cc] Motivation Let's say we want to scrape options data for a particular stock. As an example, let's look at Netflix (since it's well known). If…
Read More