Running R Code in Parallel

Running R Code in Parallel

R
Background Running R code in parallel can be very useful in speeding up performance. Basically, parallelization allows you to run multiple processes in your code simultaneously, rather than than iterating over a list one element at a time, or running a single process at a time. Thankfully, running R code in parallel is relatively simple using the parallel package. This package provides parallelized versions of sapply, lapply, and rapply. Parallelizing code works best when you need to call a function or perform an operation on different elements of a list or vector when doing so on any particular element of the list (or vector) has no impact on the evaluation of any other element. This could be running a large number of models across different elements of a list, scraping…
Read More
Word Frequency Analysis

Word Frequency Analysis

Python, Web Scraping
In a previous article, we talked about using Python to scrape stock-related articles from the web. As an extension of this idea, we’re going to show you how to use the NLTK package to figure out how often different words occur in text, using scraped stock articles. Initial Setup Let's import the NLTK package, along with requests and BeautifulSoup, which we'll need to scrape the stock articles. [code language="python" style="font-size: 8px"] '''load packages''' import nltk import requests from bs4 import BeautifulSoup [/code] Pulling the data we'll need Below, we're copying code from my scraping stocks article. This gives us a function, scrape_all_articles (along with two other helper functions), which we can use to pull the actual raw text from articles linked to from NASDAQ's website. [code language="python"] def scrape_news_text(news_url): news_html…
Read More
Running Python from the Task Scheduler

Running Python from the Task Scheduler

Python, System Administration
Background Running Python from the Windows Task Scheduler is a really useful capability. It allows you to run Python in production on a Windows system, and can save countless hours of work. For instance, running code like extracting data from a database on an automated, regular basis is a common need at many companies. How to run Python from the command line Before we go into how to schedule a Python script to run, you need to understand how to run Python from the command line. To open the command prompt (command line), press the windows key and type cmd into the search box. Next, suppose your python script is called cool_python_script.py, and is saved under C:\Users. You can run this script from the command prompt by typing the below…
Read More