Web Scraping Archives - Page 2 of 2 - Open Source Automation

12Oct 2017 by Andrew Treadway

Word Frequency Analysis

In a previous article, we talked about using Python to scrape stock-related articles from the web. As an extension of this idea, we’re going to show you how to use the NLTK package to figure out how often different words occur in text, using scraped stock articles. Initial Setup Let's import the NLTK package, along with requests and BeautifulSoup, which we'll need to scrape the stock articles. [code language="python" style="font-size: 8px"] '''load packages''' import nltk import requests from bs4 import BeautifulSoup [/code] Pulling the data we'll need Below, we're copying code from my scraping stocks article. This gives us a function, scrape_all_articles (along with two other helper functions), which we can use to pull the actual raw text from articles linked to from NASDAQ's website. [code language="python"] def scrape_news_text(news_url): news_html…

RoboBrowser: Automating Online Forms

Python, Web Scraping

Background RoboBrowser is a Python 3.x package for crawling through the web and submitting online forms. It works similarly to the older Python 2.x package, mechanize. This post is going to give a simple introduction using RoboBrowser to submit a form on Wunderground for scraping historical weather data. Initial setup RoboBrowser can be installed via pip: [code lang="python"] pip install robobrowser [/code] Let's do the initial setup of the script by loading the RoboBrowser package. We'll also load pandas, as we'll be using that a little bit later. [code lang="python"] from robobrowser import RoboBrowser import pandas as pd [/code] Create RoboBrowser Object Next, we create a RoboBrowser object. This object functions similarly to an actual web browser. It allows you to navigate to different websites, fill in forms, and get…

Scraping Articles About Stocks

Python, Web Scraping

See recommended books here. The following article will show you an example of how to scrape articles about stocks from the Web using Python 3. Specifically, we'll be looking at articles linked from http://www.nasdaq.com. If you're not familiar with list comprehensions, you may want to check this, as we'll be using them in our code. Initial, Specific Example Let's start with a specific stock -- say, Netflix, for example. Articles linked to a specific stock ticker from Nasdaq's website have the following pattern: http://www.nasdaq.com/symbol/TICKER/news-headlines, where TICKER is replaced with whatever ticker you want. In our case, we will start by dealing specifically with Netflix's (NFLX) stock. So our site of interest is: http://www.nasdaq.com/symbol/nflx/news-headlines The first step is to load the requests and BeautifulSoup packages. Here, we'll also set the variable site equal to…

Tag: Web Scraping