Categories: PythonWeb Scraping

How to download fundamentals data with Python

How can we download fundamentals data with Python?

In this post we will explore how to download fundamentals data with Python. We’ll be extracting fundamentals data from Yahoo Finance using the yahoo_fin package. For more on yahoo_fin, including installation instructions, check out its full documentation here or my YouTube video tutorials here.

Getting started

Now, let’s import the stock_info module from yahoo_fin. This will provide us with the functionality we need to scrape fundamentals data from Yahoo Finance. We’ll also import the pandas package as we’ll be using that later to work with data frames.


import yahoo_fin.stock_info as si
import pandas as pd

Next, we’ll dive into getting common company metrics, starting with P/E ratios.

How to get P/E (Price-to-Earnings) Ratios

There’s a couple ways to get the current P/E ratio for a company. First, we can use the get_quote_table method, which will extract the data found on the summary page of a stock (see here).


quote = si.get_quote_table("aapl")

Next, let’s pull the P/E ratio from the dictionary that is returned.


quote["PE Ratio (TTM)"] # 22.71

A company’s P/E ratio can also be extracted from the get_stats_valuation method. Running this method returns a data frame of the “Valuation Measures” on the statistics tab for a stock.


val = si.get_stats_valuation("aapl")

val = val.iloc[:,:2]

val.columns = ["Attribute", "Recent"]

Next, let’s extract the P/E ratio.


float(val[val.Attribute.str.contains("Trailing P/E")].iloc[0,1])

How to get P/S (Price-to-Sales) Ratios

Another popular metric is the P/S ratio. We can get the P/S ratio, along with several other other metrics, using the same get_stats_valuation method. Let’s use the object we pulled above, currently stored as val.

Then, we can get the Price/Sales ratio like below.


float(val[val.Attribute.str.contains("Price/Sales")].iloc[0,1])

Getting fundamentals stats for many stocks at once

Now, let’s get the Price-to-Earnings and Price-to-Sales ratios for each stock in the Dow. We could also do this for a custom list of tickers as well.


# get list of Dow tickers
dow_list = si.tickers_dow()


# Get data in the current column for each stock's valuation table
dow_stats = {}
for ticker in dow_list:
    temp = si.get_stats_valuation(ticker)
    temp = temp.iloc[:,:2]
    temp.columns = ["Attribute", "Recent"]

    dow_stats[ticker] = temp


# combine all the stats valuation tables into a single data frame
combined_stats = pd.concat(dow_stats)
combined_stats = combined_stats.reset_index()

del combined_stats["level_1"]

# update column names
combined_stats.columns = ["Ticker", "Attribute", "Recent"]


Price-to-Earnings ratio for each Dow stock

The P/E ratio for each stock can be obtained in a single line:


# get P/E ratio for each stock
combined_stats[combined_stats.Attribute.str.contains("Trailing P/E")


Getting the Price-to-Sales ratio for each Dow stock

After the above code, we can get the Price / Sales ratios for each stock like below.


# get P/S ratio for each stock
combined_stats[combined_stats.Attribute.str.contains("Price/Sales")


How to get Price / Book ratio

Similarly, we can get the Price-to-Book ratio for every stock in our list below.


# get Price-to-Book ratio for each stock
combined_stats[combined_stats.Attribute.str.contains("Price/Book")

How to get PEG ratio

Next, let’s get the PEG (Price / Earnings-to-Growth ratio).


# get PEG ratio for each stock
combined_stats[combined_stats.Attribute.str.contains("PEG")

How to get forward P/E ratios

We can get forward P/E ratios like this:


# get forward P/E ratio for each stock
combined_stats[combined_stats.Attribute.str.contains("Forward P/E")]

Getting additional stats from multiple stocks

In addition to the “Valuation Measures” table on the stats tab, we can also scrape the remaining data points on the webpage using the get_stats method. Calling this method lets us extract metrics like Return on Equity (ROE), Return on Assets, profit margin, etc. Click here to see the webpage for Apple.

Similar to above, we can get this information for each stock in the Dow.


dow_extra_stats = {}
for ticker in tqdm(dow_list):
    dow_extra_stats[ticker] = si.get_stats(ticker)
    

combined_extra_stats = pd.concat(dow_extra_stats)

combined_extra_stats = combined_extra_stats.reset_index()

del combined_extra_stats["level_1"]

combined_extra_stats.columns = ["ticker", "Attribute", "Value"]

How to get Return on Equity (ROE)

Using the result data frame, combined_extra_stats, let’s get Return on Equity for each stock in our list.


combined_extra_stats[combined_extra_stats.Attribute.str.contains("Return on Equity")]

How to get Return on Assets

A simple tweak gives us Return on Assets for each stock.


combined_extra_stats[combined_extra_stats.Attribute.str.contains("Return on Assets")]

How to get profit margin

To get profit margin, we just need to adjust our filter like below.


combined_extra_stats[combined_extra_stats.Attribute.str.contains("Profit Margin")]

How to get balance sheets

We can extract balance sheets from Yahoo Finance using the get_balance_sheet method. Using the data frame that is returned, we can get several attributes about the stock’s financials, including total cash on hand, assets, liabilities, stockholders’ equity, etc.


sheet = si.get_balance_sheet("aapl")

How to get total cash on hand

We can see the “Total Cash” row in the balance sheet by filtering for “cash”. This will give us the total cash value for the last several years.


sheet.loc["cash"]

How to get stockholders’ equity

Next, we can also get Total Stockholders’ Equity.


sheet.loc["totalStockholderEquity"]

How to get a company’s total assets

Now, let’s get Total Assets.


sheet.loc["totalAssets"]

How to get balance sheets for many stocks at once

Like with the company statistics tables we pulled earlier, we can also download the balance sheet for all the stocks in the Dow (or again, a custom list of your choice).


balance_sheets = {}
for ticker in dow_list:
    balance_sheets[ticker] = si.get_balance_sheet(ticker)
    

From here, we could then look at values from the balance sheets across multiple companies at once. For example, the code below combines the balance sheets from each stock in the Dow. Since each individual balance sheet may have different column headers (from different dates), we’ll just get the most recent column of data from the balance sheet for each stock.


recent_sheets = {ticker : sheet.iloc[:,:1] for ticker,sheet in balance_sheets.items()}

for ticker in recent_sheets.keys():
    recent_sheets[ticker].columns = ["Recent"]

# combine all balance sheets together
combined_sheets = pd.concat(recent_sheets)

# reset index to pull in ticker
combined_sheets = combined_sheets.reset_index()

# update column names
combined_sheets.columns = ["Ticker", "Breakdown", "Recent"]

Now we have a data frame containing the balance sheet information for each stock in our list. For example, we can look at the Total Assets for each Dow stock like this:


combined_sheets[combined_sheets.Breakdown == "totalAssets"]

How to get income statements

Next, let’s examine income statements. Income statements can be downloaded from Yahoo Finance using the get_income_statement method. See an example income statement here.


si.get_income_statement("aapl")

Using the income statement, we can examine specific values, such as total revenue, gross profit, total expenses, etc.

Looking at a company’s total revenue

To get the total revenue, we just need to apply a filter like previously.


income.loc["totalRevenue"]

Getting a company’s gross profit

Similarly, we can get the gross profit:


income.loc["grossProfit"]

Getting the income statement from each Dow stock

Next, let’s pull the income statement for each Dow stock.


income_statements = {}
for ticker in dow_list:
    income_statements[ticker] = si.get_income_statement(ticker)

Now, we can look at metrics in the income statement across multiple companies at once. First, we just need to combine the income statements together, similar to how we combined the balance sheets above.


recent_income_statements = {ticker : sheet.iloc[:,:1] for ticker,sheet in income_statements.items()}

for ticker in recent_income_statements.keys():
    recent_income_statements[ticker].columns = ["Recent"]

combined_income = pd.concat(recent_income_statements)

combined_income = combined_income.reset_index()

combined_income.columns = ["Ticker", "Breakdown", "Recent"]


Now that we have a combined view of the income statements across stocks, we can examine specific values in the income statements, such as Total Revenue, for example.


combined_income[combined_income.Breakdown == "totalRevenue"]

How to extract cash flow statements

In this section, we’ll extract cash flow statements. We can do that using the get_cash_flow method.


flow = si.get_cash_flow("aapl")

Here’s the first few rows of the cash flow statement:

Now let’s get the cash flow statements of each Dow stock.


cash_flows = {}
for ticker in dow_list:
    cash_flows[ticker] = si.get_cash_flow(ticker)

Again, we combine the datasets above, using similar code as before.


recent_cash_flows = {ticker : flow.iloc[:,:1] for ticker,flow in cash_flows.items()}


for ticker in recent_cash_flows.keys():
    recent_cash_flows[ticker].columns = ["Recent"]


combined_cash_flows = pd.concat(recent_cash_flows)

combined_cash_flows = combined_cash_flows.reset_index()

combined_cash_flows.columns = ["Ticker", "Breakdown", "Recent"]


Now, we can examine information in the cash flow statements across all the stocks in our list.

Getting dividends paid across companies

One example to look at in a cash flow statement is the amount of dividends paid, which we can see across the companies in our list by using the filter below.


combined_cash_flows[combined_cash_flows.Breakdown == "dividendsPaid"]

Getting stock issuance information

Here’s another example – this time, we’ll look at debt-related numbers across the cash flow statements.


combined_cash_flows[combined_cash_flows.Breakdown == "issuanceOfStock"]

Conclusion

That’s it for this post! Learn more about web scraping by checking out this online course on Udemy that I co-created with 365 Data Science! You’ll learn all about scraping data from different sources, downloading files programmatically, working with APIs, scraping JavaScript-rendered content, and more! Check it out here!

Andrew Treadway

Recent Posts

Software Engineering for Data Scientists (New book!)

Very excited to announce the early-access preview (MEAP) of my upcoming book, Software Engineering for…

1 year ago

How to stop long-running code in Python

Ever had long-running code that you don't know when it's going to finish running? If…

2 years ago

Faster alternatives to pandas

Background If you've done any type of data analysis in Python, chances are you've probably…

3 years ago

Automated EDA with Python

In this post, we will investigate the pandas_profiling and sweetviz packages, which can be used…

3 years ago

How to plot XGBoost trees in R

In this post, we're going to cover how to plot XGBoost trees in R. XGBoost…

3 years ago

Python collections tutorial

In this post, we'll discuss the underrated Python collections package, which is part of the…

3 years ago