Training ML Models 10x Faster for Stock Price Prediction

By Intrinio
September 11, 2023

Are you ready to revolutionize your trading strategies and take your predictions to the next level? We are going to walk you through a groundbreaking solution that can supercharge your short-term price predictions. It’s all part of a recent collaboration between NVIDIA, the leading innovator in data science and accelerated computing, and Intrinio - a powerfully flexible market data API provider.

The Intrino team worked with Mark Benett (Senior Data Scientist, NVIDIA), Prabhu Ramamoorthy, CFA/FRM in NVIDIA Financial services team, as well as other team members, to answer an important question: What happens when you put quality market data together with the most recent advancements in GPU acceleration? 

Leveraging ML Models for Stock Prediction

Let’s explore how to leverage machine learning in order to make the most efficient stock market predictions. If you are new to the high frequency trading or quant space - here’s why this matters. High frequency traders and quants generate alpha by making predictions on the stock market and then getting ahead of it. This requires finding a quality stock price data set, developing a machine-learning algorithm, and then training it. The faster you can train your ML models, the better they will get.

The result of this is better price prediction and more alpha. NVIDIA engineers recently conducted an experiment to accelerate ML-model training, and the results will likely surprise you. So, what’s the big secret they discovered?

Before we dive into the details, there are two essential resources you’ll need to get in order if you want to replicate their results.

Market Data Provider

First, you’ll need to find a quality, reliable, and affordable provider of the underlying market data you’ll be using in your models.

In this case, market data refers to an accurate set of real historical stock prices. Reliable and accurate stock price data is a MUST as a foundation for a good price prediction model. As the age-old saying goes - garbage in - garbage out.

Intrinio offers the most reliable, high-quality, and affordable stock price data feeds on the market, backed by a powerful, well-documented API, WebSocket, and a full suite of SDKs. Both the Intrinio API and WebSocket are set up ideally for quants and high frequency traders that are building models to make stock market predictions.

In this case, NVIDIA engineers used Intrinio’s time series dataset with actual real-time stock prices for NYSE and NASDAQ tickers from the DOW 30 stocks on a 1-second basis. Be sure to check out some of our resources like “Why Work With a Financial Data Provider” or “Top 5 Mistakes to Avoid When Buying Financial Data” or “Why There is No Such Thing as Free Real Time Stock Prices” before you license market data.

Installing Your ML Libraries

After you have access to a reliable data feed, you’ll need to install your ML libraries and get your environment set up. The first step is installing a NVIDIA driver and a CUDA toolkit  (which you can find on the NVIDIA website). You’ll need access to a GPU with compute capability 6.0 or above (like NVIDIA Pascal).

After that, head to Github to install the ABIDES library - this stands for Agent-Based Interactive Discrete Event Simulation Environment - more on this later. Lastly, you’ll want to install RAPIDS - this is the secret sauce. Powered by NVIDIA's cutting-edge technology, RAPIDS has changed the game for data scientists and traders.

It leverages the power of GPUs, allowing you to process massive amounts of data in record time using familiar APIs and popular libraries.

With that in mind, here’s a handy checklist:

  • Recent CUDA version and NVIDIA driver pairs
  • GPU with compute capability 6.0 or above
  • ABIDES market simulator

Once you have your data flowing and this environment set up, you are ready to go! 

How NVIDIA Learned Limit Order, Model Training & Price Prediction

Now, we will walk you through the steps that NVIDIA took to better understand limit order book structure, more efficient model training, and better price prediction. 

To keep things simple, NVIDIA focused exclusively on the AAPL (Apple, Inc.) ticker. They leveraged the limit-order-book (LOB) as a predictor of short term price movements, so let’s start with a little refresher on limit orders. Buyers of a stock want to pay as little as possible, and sellers want to earn as much as possible.

Limit orders are a way to set a limit on the amount you’ll bid and ask. So, a limit order book is a list of order sizes with the security prices on the x-axis and the total volume at that price on the y-axis for both bids and asks.

Here’s an example of GOOG (Google’s) limit order book:

These two charts are from two different points in time - about 292 micro-seconds apart. The red line in the middle represents the mid-quote price between the bid and the ask prices. You can see that the bid prices - on the left - are lower than the midpoint, and the ask prices - on the right - are higher.

As you can see on this chart, more levels in the book leads to better price prediction because a classifier just simply has access to more information during training.

Figure 3. Limit order book depth can vary. The ML mid-price direction prediction accuracy is more robust with more levels in the book Image credit: Faisal | Qureshi

Nvidia’s first goal was to generate plausible limit-order-book (LOB) data that could be used to train their model. They leveraged something called ABIDES, which is an approach to simulating the financial markets. ABIDES simulates individual traders buying and selling assets through exchanges so that you can analyze the results of different strategies.

Leveraging Intrinio’s real historical 1-second quotes with ABIDES helped simulate an even more realistic market. These historical prices can be thought of as the “fundamental” or true value of the securities historically. Check out this chart that compares the mid-price of NVIDIA’s output LOB data with the 1-second quotes from Intrinio.

Next, they used this ABIDES-generated LOB data to train a random forest model to predict short-term price movements: will the stock price move upwards, downwards, or stay flat?

Specifically, they trained a classifier to predict whether the average of the next 20 mid-prices (mnext) will be smaller or larger than the average of the previous 20 mid-prices (mprev) by a certain margin. Mid-prices and averages were calculated using both the RAPIDS cuDF library and pandas. 

Here are some details on the set-up:

  • Dataset: 90 days of ABIDES-generated data (7.5 million labeled LOB frames)
  • Margin: 0.5 cents (smallest non-zero difference in mid-prices between any two LOB frames in the dataset)
  • Training Set: 66% of the data
  • Testing Set: 33% of the data
  • Random Forest Model Input: 40 features (bid and ask prices and volumes at 10 LOB levels)
  • A label of 2: upward price movement (mnext – mprev > 0.5 cents)
  • A label of 1: neutral price movement
  • A label of 0: downward price movement (mnext – mprev < -0.5 cents)
  • Environment
    - One NVIDIA A100 80 GB SXM for RAPIDS cuDF and RAPIDS cuML
    - Two AMD EPYC 7742 64-core processors for scikit-learn and pandas

Note in the very last section that two different experiments were run - and just LOOK at these results. First, check out the difference in preprocessing time between using CPU with pandas, and GPU with cuDF.

Figure 4. Comparison of mean preprocessing time on CPU with pandas,
and on GPU with cuDF

And in this chart you can see that using the cuML on GPU instead of scikit on CPU is nearly TEN TIMES FASTER for training.

Figure 5. Training runtime in seconds for scikit-learn on CPU and cuML on GPU

This experiment, run by NVIDIA and powered by Intrinio data, is a GAME CHANGER for machine learning algorithm research. Using quality data and incredible tools like RAPIDS provides traders with an unparalleled advantage in short-term price prediction. 

With the combination of RAPIDS and Intrinio data, you can spot emerging trends, anticipate price fluctuations, and seize profitable opportunities in a fraction of the time it used to take. If you want to predict market movements faster than ever before, be sure to check out RAPIDS cuDF and cuML as a potential replacement for Pandas and scikit Python libraries.

At Intrinio, we can get you started with a reliable set of data to use for training your models. On our website, you can request a consultation with one of our data experts, or chat with us live to get set up with a free trial of historical stock price data. The future of trading is here, and we’re ready to help you be part of it.

No items found.
Sorry, we no longer support Internet Explorer as a web browser.

Please download one of these alternatives and return for the full Intrinio Experience.

Google Chrome web browser icon
Mozilla Firefox web browser icon
Safari web browser icon
Microsoft Edge web browser icon
Microsoft Edge