Backtesting with ML-based investment strategies

If you create a Machine Learning model to predict the price of a stock:

Graphical representation of a decision tree model used to predict stock prices, illustrating how the feature space is divided to reach more accurate predictions. — F1. Representation of a decision tree model

How can you evaluate its performance if you apply it to your investment strategy?

Illustrative scheme of applying a Machine Learning model in an investment strategy, showing the decision-making process based on price predictions. — F2. Integration of Machine Learning in investment strategy

Data

We start with the stock data of NVIDIA with its ticker NVDA.

Check out this tutorial to learn how to preprocess the daily return of a stock.

import pandas as pd

df = pd.read_csv('data.csv', index_col='Date', parse_dates=True)

Capture of NVIDIA's stock data, showing a dataset ready for analysis and application of Machine Learning models, focusing on data preparation and cleaning. — F3. NVIDIA data prepared for ML analysis

Questions

How is a Machine Learning model implemented to predict the change in closing price?
What is the role of the min_samples_leaf parameter in the DecisionTreeRegressor algorithm?
How do we measure the model’s error and what does it tell us about its performance?
How do we introduce a Machine Learning model into an investment strategy?
How do we evaluate the performance of the Machine Learning investment strategy?

Methodology

Feature selection

We want to predict the percentage change in tomorrow’s closing price. This will be the target variable, and the rest will be the explanatory ones.

target = 'Change Tomorrow'
y = df[target]
X = df.drop(columns=target)

Machine learning model

We will use the DecisionTreeRegressor algorithm to predict the change in the closing price.

As a minimum, we want there to be 10 samples at the end of each branch of the tree, min_samples_leaf=10.

from sklearn.tree import DecisionTreeRegressor

model = DecisionTreeRegressor(min_samples_leaf=10)
model.fit(X, y)

Model evaluation

With the model’s mathematical conditions, we calculate the model’s predictions:

y_pred = model.predict(X)

And compare them with the actual values, thus obtaining the error.

error = y - y_pred

To have a better evaluation metric of the model, we calculate the root mean square error (RMSE); it usually tells us how much the predictions deviate from the actual value 68% of the time.

error2 = error ** 2

MSE = error2.mean()
RMSE = MSE ** 0.5

In our case, the prediction of the percentage change for tomorrow that the model makes will deviate on average 2.99% from the actual value.

Is this an acceptable value for our investment strategy? How could we improve it?

I read your comments to design the next tutorial.

Now let’s continue with the implementation of the investment strategy using the backtesting.py library.

Create investment strategy

To implement the investment strategy, we create a class that inherits from backtesting.Strategy functionalities.

For this, the class requires two methods: init and next.

init: initializes the strategy with the previously calculated model.
next: calculates the prediction for tomorrow and decides whether to buy, sell, or do nothing.

from backtesting import Strategy

class MLStrategy(Strategy):
    def init(self):
        self.model = model
        
    def next(self):
        X_today = self.data.df.iloc[[-1]]
        y_tomorrow = self.model.predict(X_today)
        
        if y_tomorrow > RMSE:
            self.buy()
        elif y_tomorrow < -RMSE:
            self.sell()
        else:
            pass

Backtest with trading conditions

Finally, we simulate the investment strategy (aka backtest) with the following conditions to evaluate its performance.

from backtesting import Backtest

bt = Backtest(
  X, MLStrategy, cash=1_000, commission=.002,
  exclusive_orders=True, trade_on_close=True
)

results = bt.run()

In the backtest report, we observe that, after 1,533 days, we obtain a Final Equity of $11,019.19.

Summary of the results of a backtest applying a Machine Learning-based investment strategy, highlighting the final equity and total return obtained during the test period. — F4. Backtesting results of the ML strategy

Although it would have been easier to buy and hold the stock without a Machine Learning model; obtaining a Return of 1,372.08% (vs. 1,001.92%).

How could we improve the Machine Learning investment strategy? I read your comments.

Visualize backtest simulation

Finally, we visualize the backtest simulation to better understand the performance of the investment strategy.

bt.plot()

In addition to performance metrics, we observe one that is crucial for evaluating the investment strategy: the Drawdown.

This metric tells us how much we would be willing to suffer without closing the position.

Interactive chart generated by the backtesting.py library, offering a detailed visualization of the investment strategy's performance over time, including key metrics such as drawdown. — F5. ML strategy performance simulation

In other words, Drawdown measures the risk of the investment strategy.

If you want to delve into integrating Machine Learning models into investment strategies, I invite you to check out this course.

Conclusions

Machine Learning Model: DecisionTreeRegressor is a tree algorithm that selects the most significant historical patterns to predict changes in prices.
Parameter min_samples_leaf: min_samples_leaf=10 prevents overfitting by ensuring a minimum number of samples in the tree leaves, improving the model’s generalization.
Error Measurement: RMSE to quantify the deviation of predictions from actual values with 68% confidence.
Introduction in Investment Strategy: Strategy with init and next integrates the model’s predictions into trading decisions.
Performance Evaluation: Backtest allows us to simulate the investment strategy with customized trading conditions.

Backtesting with ML-based investment strategies

Data

Questions

Methodology

Feature selection

Machine learning model

Model evaluation

Create investment strategy

Backtest with trading conditions

Visualize backtest simulation

Conclusions

Machine Thinking

Think like a machine to program anything with data.

Read next

Main differences between matplotlib, seaborn, and plotly

Reporting annual cumulative returns on multiple assets in Python

Bollinger Bands with Python applied to the S&P500