WallStreetBots: Stock Price Prediction

Proposal

Medium Article

Introduction

Creating a stock trading AI is far more sophisticated than predicting future prices based on numerical methods such as momentum, or machine learning methods such as LSTM or transformers. While the later one is pure data analysis and machine learning, the former one covers a lot greater scopes, from not only price prediction, but also portfolio management strategies, market condition analysis, stock sentiment analysis, data communications, and deployment on server. In this project, we want to create an AI that is practically useful - yes, that means it makes money. We will try a variety of ML algorithms, from simple RNN to NLP, and hopefully reinforcement learning — and at the end our work will be deployed as a bot to trade with paper money. Will it be able to make money? Let’s find out!

Framework

Dataset

To obtain real-time data such as order types, store stock information, and simulate the trading environment realistically, we built our framework on top of the Alpaca paper trade API. Whenever a user places an order at wallstreetbots.org, the same information is updated at Alpaca paper trade and vice versa.

Alpaca-API

Webapp

The web app was built using the core technologies of docker, Python+Django, and psql. The web app is roughly split into the basic CRUD functionality of the dashboard for showing portfolios and entering orders, and the actual stock trading engine that makes the decisions. ML strategies are fed into the generic pipeline that is called by the web app every day to automatically rebalance for each user. Each new machine learning strategy inherits from a generic strategy and overrides with a different rebalancing method. This ensures that adding new strategies is as easy as possible.

Functionalities

Our ML models predict next-day closing/open stock prices by probing into both traditional market data and sentiment analysis from real-time news headlines and reddit discussions. The predictions will then be interpreted by our portfolio balancing pipeline where we place trade orders based on the differencse between the optimal weights and our current portfolio weights each day.

Architecture

WallStreetBots Architecture

The image above summarizes the entire pipeline of WallStreetBots.

Besides the market data obtained from Alpaca, we also pulled textual information for sentiment analysis from:

  • NYT News Archive
  • FinViz News Headlines
  • r/wallstreetbets Daily Discussion Comments
  • RSS feeds from various news sources

These textual information are combined with traditional market data when fed into our distilled LSTM model. The input features are summarized in the table below:

Input features for WallStreetBots

Results

Results of models tested Performance of our best model – Distilled LSTM

Futute Works

Building on top of the success of this project in 2022, we will be expanding our framework to the market of cryptocurrencies, where quantitative analysis is expected to yield much better correlations. We are expecting to perform price predictions for the 14 most common cryptos by the end of this year. For more information regarding our proposal and recruitment, please visit https://utmist.gitlab.io/projects/wallstreetbots2/.

We also plan to improve our textual analysis pipeline by incorporating twitter data, which has been shown to influence stock prices in a wide range of time span. Our NLP subteam will be experimenting novel methodologies this year to understand the correlations between tweets and their temporal influence on stock/crypto prices.