Algorithmically trading with r/wallstreetbets discussion data
I’ve spent a large part of the last year working with data from wallstreetbets, the internet’s most active day trading community, and want to show how to create and backtest a simple strategy using the data. With a few extra steps, the code shown here could be modified to algorithmically trade based on discussion by retail investors.
The Strategy
Simplicity is the goal here, as I just want to provide a framework which can be built upon as desired:
- Get data on the previous week’s WallStreetBets discussion.
- Identify the five most mentioned stocks.
- Buy those stocks at the start of the trading week, sizing positions based on how much they were talked about in proportion to each other.
- Sell positions at the end of the trading week.
- Repeat
One thing which I do not incorporate into this strategy is any information on the sentiment of wallstreetbets towards individual stocks. While the subreddit generally tends towards long positions (“stocks only go up” is a common saying) this is something that might be worth implementing in a more sophisticated strategy.
Implementation
Getting wallstreetbets discussion data
I used the quiverquant package in Python to easily access wallstreetbets discussion data through Quiver Quantitative’s API.
I use the Institution plan of this API in order to get live comment-level data but, with a few changes to my code, you should be able to implement a similar strategy using the Trader plan.
import quiverquant
import pandas as pd#Replace <token> with your personal token
quiver = quiverquant.quiver(<token>)df = quiver.wallstreetbetsComments(date_from=”20180901")
Using the above code, I am able to get a Pandas DataFrame of approximately 3.3 million rows with data on ticker mentions on wallstreetbets going back to September 2018. Note that if you’d like to dive deeper into sentiment analysis of comments, you can use the wallstreetbetsCommentsFull function with the Institution plan.
I will then group the data to get the number of times each ticker was mentioned each week.
#Get date of comments
df["Date"] = df["Datetime"].dt.date#Get number of comments mentioning each ticker on each day
dfDay = df.groupby(["Ticker","Date"]).count().reset_index()dfDay = dfDay.rename(columns={"Puts": "Mentions"})
dfDay = dfDay[["Ticker", "Date", "Mentions"]]
dfDay["Date"] = pd.to_datetime(dfDay["Date"])#Get total mentions by week
dfWeek = dfDay.groupby([pd.Grouper(key='Date', freq='W-MON'), 'Ticker'])['Mentions'].sum().reset_index().sort_values('Date')
Backtesting
Next up is the implementation of the strategy. I’m not going to go into too much depth on this block of code, because I expect that most of you will be more interested in building your own strategies rather than copying the one that I show here.
import yfinance as yf
import datetime as dt
dfLarge = dfWeek[dfWeek["Mentions"]>1]dfLarge = dfLarge.sort_values("Date", ascending=True)
dates = dfLarge["Date"].unique()#Initial capital of mock portfolio
capital = 100000started = False
startedDFW = Falsefor date in dates[:-2]:
dfW = dfLarge[dfLarge["Date"]==date]
dfW = dfW.sort_values("Mentions", ascending=False).head(5)
dfW['prop'] = dfW['Mentions']/dfW["Mentions"].sum()
dfW['buy'] = capital*dfW['prop']
buyDate = date+pd.Timedelta(days=6)
dfW['buyDate'] = [buyDate]*len(dfW['buy'])
if not startedDFW:
dfWs = dfW
startedDFW = True
else:
dfWs = pd.concat([dfWs, dfW])
sellDate = date+pd.Timedelta(days=15) startedWeek = False
print(date)
for index, row in dfW.iterrows():
ticker = row["Ticker"]
print(ticker)
try:
ytStock = yf.download(ticker, start=str(buyDate.date()), end=str(sellDate.date()), interval="1d").reset_index()
shares = row["buy"]/ytStock["Adj Close"].values[0]
ytStock = ytStock.iloc[1:] except:
print("Error")
ytStock = yf.download("SPY", start=str(buyDate.date()), end=str(sellDate.date()), interval="1d").reset_index()
shares = row["buy"]/ytStock["Adj Close"].values[0]
ytStock = ytStock.iloc[1:] ytStock["OpenAmount"] = ytStock["Open"]*shares
ytStock["CloseAmount"] = ytStock["Adj Close"]*shares
ytStock["Ticker"] = [ticker]*len(ytStock["OpenAmount"])
ytStock = ytStock.fillna(method='ffill')
ytStock = ytStock.fillna(method='bfill')
ytStock = ytStock.dropna()
if not startedWeek:
dfCombined = ytStock
startedWeek = True
else:
dfCombined = pd.concat([dfCombined, ytStock])
if not started:
dfAll = ytStock
started = True
else:
dfAll = pd.concat([dfAll, ytStock])
capital = 0
for ticker in dfCombined["Ticker"].unique():
dfT = dfCombined[dfCombined["Ticker"]==ticker]
capital+=dfT["CloseAmount"].values[-1]
print("Week end capital: ", capital)
Visualization & Analysis
Because I want to compare the performance of this WSB portfolio with the market, I will also get data on the performance of SPY over the same time frame.
dfDay = dfAll.groupby("Date").sum().reset_index()
dfDay["Fund"] = ["WSB"]*len(dfDay["Close"])dfSPY = yf.download("SPY", start="2018-09-01", end="2021-02-18", interval="1d").reset_index()
dfSPY["Fund"] = ["S&P 500"] * len(dfSPY["Open"])
shares = 100000/dfSPY["Open"].values[0]
dfSPY["OpenAmount"] = dfSPY["Open"]*shares
dfSPY["CloseAmount"] = dfSPY["Close"]*shares
dfCombined = pd.concat([dfDay, dfSPY])
Now I can graph out how the WSB fund did compared to the market using Plotly.
import plotly.express as px
import plotlyfig = px.line(dfCombined, x="Date", y="CloseAmount", title='WSB', color="Fund", color_discrete_sequence=["rgb(229, 81, 39)","rgb(118, 213, 232)" ])wsbReturn = (capital-100000)/100000*100fig.update_layout(title="<b>+"+str(round(wsbReturn, 2))+"% Return</b><br>Aug 2018 - Feb 2021", titlefont=dict(color='rgb(229, 81, 39)', size=20), plot_bgcolor='rgb(32,36,44)', paper_bgcolor='rgb(32,36,44)')
fig.update_xaxes(title_text="",color='white', showgrid=False, tickfont=dict(size=10))
fig.update_yaxes(title_text="$", color='white', showgrid=False, titlefont=dict(size=20),gridcolor="rgb(228,49,34)")
fig.update_layout(
legend=dict(
title=dict(text="",font=dict(color='white')),
x=.85, y=1.15,
font=dict(
color='white',
size=15
)
)
)
fig.update_traces(line=dict(width=3))
fig.show()
I can also see what the portfolio was comprised of each week.
import plotly.graph_objects as gofig = go.Figure(px.bar(dfWs, x="buyDate", y="buy", color='Ticker',text='Ticker',color_discrete_sequence=px.colors.qualitative.Light24))fig.update_layout(title="Portfolio by Week", titlefont=dict(color='rgb(228,49,34)'), plot_bgcolor='rgb(32,36,44)', paper_bgcolor='rgb(32,36,44)')
fig.update_xaxes(title_text="",color='white', showgrid=False, fixedrange=False)
fig.update_yaxes(title_text="$",color='white', showgrid=False, fixedrange=False,gridwidth=1,gridcolor="rgb(109,177,174)")
fig.update_layout(
legend=dict(
title=dict(text="Ticker",font=dict(color="white")),
font=dict(
color='white'
),
)
)fig.show()
This graphic is pretty indistinguishable as a static image, but I put interactive versions of the visualizations up on this dashboard, which allows you to see the information by zooming and hovering.
Conclusion
It probably goes without saying that the past performance of this strategy is no indication of future results and that this post is not intended as financial advice.
That being said, I do think that, in the right hands, there is strong potential in using data from wallstreetbets discussion to generate alpha.