Algorithmically trading with r/wallstreetbets discussion data

CHRISTOPHER KARDATZKE
4 min readFeb 18, 2021

--

I’ve spent a large part of the last year working with data from wallstreetbets, the internet’s most active day trading community, and want to show how to create and backtest a simple strategy using the data. With a few extra steps, the code shown here could be modified to algorithmically trade based on discussion by retail investors.

The Strategy

Simplicity is the goal here, as I just want to provide a framework which can be built upon as desired:

  1. Get data on the previous week’s WallStreetBets discussion.
  2. Identify the five most mentioned stocks.
  3. Buy those stocks at the start of the trading week, sizing positions based on how much they were talked about in proportion to each other.
  4. Sell positions at the end of the trading week.
  5. Repeat

One thing which I do not incorporate into this strategy is any information on the sentiment of wallstreetbets towards individual stocks. While the subreddit generally tends towards long positions (“stocks only go up” is a common saying) this is something that might be worth implementing in a more sophisticated strategy.

Implementation

Getting wallstreetbets discussion data

I used the quiverquant package in Python to easily access wallstreetbets discussion data through Quiver Quantitative’s API.

I use the Institution plan of this API in order to get live comment-level data but, with a few changes to my code, you should be able to implement a similar strategy using the Trader plan.

import quiverquant
import pandas as pd
#Replace <token> with your personal token
quiver = quiverquant.quiver(<token>)
df = quiver.wallstreetbetsComments(date_from=”20180901")
Wallstreetbets discussion data
Wallstreetbets discussion data

Using the above code, I am able to get a Pandas DataFrame of approximately 3.3 million rows with data on ticker mentions on wallstreetbets going back to September 2018. Note that if you’d like to dive deeper into sentiment analysis of comments, you can use the wallstreetbetsCommentsFull function with the Institution plan.

I will then group the data to get the number of times each ticker was mentioned each week.

#Get date of comments
df["Date"] = df["Datetime"].dt.date
#Get number of comments mentioning each ticker on each day
dfDay = df.groupby(["Ticker","Date"]).count().reset_index()
dfDay = dfDay.rename(columns={"Puts": "Mentions"})
dfDay = dfDay[["Ticker", "Date", "Mentions"]]
dfDay["Date"] = pd.to_datetime(dfDay["Date"])
#Get total mentions by week
dfWeek = dfDay.groupby([pd.Grouper(key='Date', freq='W-MON'), 'Ticker'])['Mentions'].sum().reset_index().sort_values('Date')
Wallstreetbets discussion data grouped by week and ticker
Wallstreetbets discussion data grouped by week and ticker

Backtesting

Next up is the implementation of the strategy. I’m not going to go into too much depth on this block of code, because I expect that most of you will be more interested in building your own strategies rather than copying the one that I show here.

import yfinance as yf
import datetime as dt
dfLarge = dfWeek[dfWeek["Mentions"]>1]
dfLarge = dfLarge.sort_values("Date", ascending=True)
dates = dfLarge["Date"].unique()
#Initial capital of mock portfolio
capital = 100000
started = False
startedDFW = False
for date in dates[:-2]:
dfW = dfLarge[dfLarge["Date"]==date]
dfW = dfW.sort_values("Mentions", ascending=False).head(5)
dfW['prop'] = dfW['Mentions']/dfW["Mentions"].sum()
dfW['buy'] = capital*dfW['prop']
buyDate = date+pd.Timedelta(days=6)
dfW['buyDate'] = [buyDate]*len(dfW['buy'])
if not startedDFW:
dfWs = dfW
startedDFW = True
else:
dfWs = pd.concat([dfWs, dfW])
sellDate = date+pd.Timedelta(days=15)
startedWeek = False
print(date)
for index, row in dfW.iterrows():
ticker = row["Ticker"]
print(ticker)
try:
ytStock = yf.download(ticker, start=str(buyDate.date()), end=str(sellDate.date()), interval="1d").reset_index()
shares = row["buy"]/ytStock["Adj Close"].values[0]
ytStock = ytStock.iloc[1:]
except:
print("Error")
ytStock = yf.download("SPY", start=str(buyDate.date()), end=str(sellDate.date()), interval="1d").reset_index()
shares = row["buy"]/ytStock["Adj Close"].values[0]
ytStock = ytStock.iloc[1:]
ytStock["OpenAmount"] = ytStock["Open"]*shares
ytStock["CloseAmount"] = ytStock["Adj Close"]*shares
ytStock["Ticker"] = [ticker]*len(ytStock["OpenAmount"])
ytStock = ytStock.fillna(method='ffill')
ytStock = ytStock.fillna(method='bfill')
ytStock = ytStock.dropna()
if not startedWeek:
dfCombined = ytStock
startedWeek = True
else:
dfCombined = pd.concat([dfCombined, ytStock])

if not started:
dfAll = ytStock
started = True
else:
dfAll = pd.concat([dfAll, ytStock])

capital = 0
for ticker in dfCombined["Ticker"].unique():
dfT = dfCombined[dfCombined["Ticker"]==ticker]
capital+=dfT["CloseAmount"].values[-1]
print("Week end capital: ", capital)

Visualization & Analysis

Because I want to compare the performance of this WSB portfolio with the market, I will also get data on the performance of SPY over the same time frame.

dfDay = dfAll.groupby("Date").sum().reset_index()
dfDay["Fund"] = ["WSB"]*len(dfDay["Close"])
dfSPY = yf.download("SPY", start="2018-09-01", end="2021-02-18", interval="1d").reset_index()
dfSPY["Fund"] = ["S&P 500"] * len(dfSPY["Open"])
shares = 100000/dfSPY["Open"].values[0]
dfSPY["OpenAmount"] = dfSPY["Open"]*shares
dfSPY["CloseAmount"] = dfSPY["Close"]*shares
dfCombined = pd.concat([dfDay, dfSPY])

Now I can graph out how the WSB fund did compared to the market using Plotly.

import plotly.express as px
import plotly
fig = px.line(dfCombined, x="Date", y="CloseAmount", title='WSB', color="Fund", color_discrete_sequence=["rgb(229, 81, 39)","rgb(118, 213, 232)" ])wsbReturn = (capital-100000)/100000*100fig.update_layout(title="<b>+"+str(round(wsbReturn, 2))+"% Return</b><br>Aug 2018 - Feb 2021", titlefont=dict(color='rgb(229, 81, 39)', size=20), plot_bgcolor='rgb(32,36,44)', paper_bgcolor='rgb(32,36,44)')
fig.update_xaxes(title_text="",color='white', showgrid=False, tickfont=dict(size=10))
fig.update_yaxes(title_text="$", color='white', showgrid=False, titlefont=dict(size=20),gridcolor="rgb(228,49,34)")
fig.update_layout(
legend=dict(
title=dict(text="",font=dict(color='white')),
x=.85, y=1.15,
font=dict(
color='white',
size=15
)
)
)
fig.update_traces(line=dict(width=3))
fig.show()
Performance of wallstreetbets vs. the market
Performance of wallstreetbets vs. the market

I can also see what the portfolio was comprised of each week.

import plotly.graph_objects as gofig = go.Figure(px.bar(dfWs, x="buyDate", y="buy", color='Ticker',text='Ticker',color_discrete_sequence=px.colors.qualitative.Light24))fig.update_layout(title="Portfolio by Week", titlefont=dict(color='rgb(228,49,34)'), plot_bgcolor='rgb(32,36,44)', paper_bgcolor='rgb(32,36,44)')
fig.update_xaxes(title_text="",color='white', showgrid=False, fixedrange=False)
fig.update_yaxes(title_text="$",color='white', showgrid=False, fixedrange=False,gridwidth=1,gridcolor="rgb(109,177,174)")
fig.update_layout(
legend=dict(
title=dict(text="Ticker",font=dict(color="white")),

font=dict(
color='white'
),

)
)
fig.show()
Portfolio by week
Portfolio by week

This graphic is pretty indistinguishable as a static image, but I put interactive versions of the visualizations up on this dashboard, which allows you to see the information by zooming and hovering.

Conclusion

It probably goes without saying that the past performance of this strategy is no indication of future results and that this post is not intended as financial advice.

That being said, I do think that, in the right hands, there is strong potential in using data from wallstreetbets discussion to generate alpha.

--

--