We can grab stock market data from Quandl. We’re going to focus on the S&P 500, a market cap weighted index of large cap American stocks (basically, the biggest American companies).
Here is the Python code for grabbing the data (note that you will need your own Quandl API key and that the Sharadar data requires a paid subscription). I’ve also added comments in the code below, but the following snippet uses Quandl’s API to grab financial data for all publicly listed companies in Sharadar’s database (financial data is stuff like revenues, net income, debt, return on invested capital, and so on). This financial data is stored in the dataframe stock_df.
quandl.ApiConfig.api_key = 'your_qandl_api_key'
import numpy as np
import pandas as pd# Download financial data from Quandl/Sharadar
table = quandl.get_table('SHARADAR/SF1', paginate=True)# Grab the most recent annual data ('MRY' denotes annual data)
stock_df = table[(table['calendardate'] == '2018-12-31 00:00:00') & (table['dimension']=='MRY')]
However, we don’t need data on every company — we just want the S&P 500. To get the S&P 500 tickers, we turn to web scraping and Wikipedia. The following lines of code web scrape a Wikipedia table that lists the tickers and industries of every company that is currently in the S&P 500 and stores it in the variable called page_content.
# Scrape S&P 500 tickers from Wikipedia
from bs4 import BeautifulSoup
import repage_link = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
page_response = requests.get(page_link, timeout=1000)
page_content = BeautifulSoup(page_response.content, 'lxml')
Next we use some parsing code to grab ticker and industry from page_content, which is a BeautifulSoup data structure. The code can be summarized as doing the following:
- Loop through all the ‘tr’ (I believe it denotes a table row in HTML) tagged items in page_content.
- Grab the ticker as a string (the .text converts the item to string format) and store it in the list tickers.
- Grab the industry as a string and store it in the list industries.
# From web scraped Wikipedia content on the S&P 500, grab the ticker and industry for each firm
tickers = 
industries = 
for i, val in enumerate(page_content.find_all('tr')):
if i > 0:
# Here is where I grab the ticker
# Here is where I grab the industry
Finally, we need to do a bit of data manipulation to join up our ticker and industry data (from Wikipedia) with our financial data (from Quandl). The first two blocks of code (as detailed in the comments) store the tickers that we want into a dataframe called sp_df.
The last block of code uses the Pandas merge method to join sp_df, which contains our ticker and industry data, with stock_df, which contains our financial data. We store the resulting dataframe, merged_df, in a .csv file that we can now load into Tableau (I use Tableau Public, which is free). Hurray done!
# After ticker ZTS, the rest of the table entries are for acquisitions/mergers
last_pos = tickers.index('ZTS') + 1
sp_tickers = tickers[:last_pos]
sp_industries = industries[:last_pos]# Create a new dataframe for S&P 500 and merge in Quandl data
sp_df = pd.DataFrame()
sp_df['tickers'] = sp_tickers
sp_df['industries'] = sp_industriesmerged_df = sp_df.merge(stock_df, left_on='tickers', right_on='ticker', how='left')