Using Reinforcement Learning for Stock Trading with FinRL

Reinforcement learning (RL) has emerged as a powerful tool for developing autonomous agents that can learn optimal strategies through trial and error. In the realm of finance, RL offers the potential to create intelligent trading agents that can make informed decisions in complex and dynamic markets. FinRL, a Python library specifically designed for financial reinforcement learning, provides a convenient and efficient way to build and train these agents.

Machine Learning Theory Underpinning RL

To understand RL's power, it's essential to grasp the underlying machine learning concepts:

Markov Decision Processes (MDPs): RL problems are typically formulated as MDPs, which define a mathematical framework for decision-making in situations where outcomes are partly random and partly under the control of a decision-maker.¹ An MDP is defined by:
- States (S): A set of possible situations the agent can be in.
- Actions (A): A set of actions the agent can take in each state.
- Transition Probabilities (P(s'|s, a)): The probability of transitioning from state s to state s' after taking action a.
- Rewards (R(s, a)): The immediate reward received after taking action a in state s.
Value Functions: Value functions estimate the "goodness" of being in a particular state or taking a specific action in a state.
- State-Value Function (V(s)): The expected cumulative reward starting in state s and following a given policy.
- Action-Value Function (Q(s, a)): The expected cumulative reward starting in state s, taking action a, and then following a given policy.
Bellman Equations: These equations provide a recursive relationship between the value of a state and the values of its successor states. They are fundamental to many RL algorithms. For example, the Bellman equation for the optimal Q-function is:
Q*(s, a) = E[R(s, a) + γ * max(Q*(s', a'))]
where γ is the discount factor, which determines the importance of future rewards.
Policy Optimization: The goal of RL is to find an optimal policy, π*(s), that maximizes the expected cumulative reward. Policy optimization methods directly search for this optimal policy.
- Value-Based Methods: Learn a value function (e.g., Q-learning) and then derive a policy from it.
- Policy Gradient Methods: Directly learn a policy by optimizing its parameters using gradient ascent.
Deep Learning in RL: Deep neural networks are often used to approximate value functions or policies, especially in high-dimensional state spaces. This combination of deep learning and RL has led to significant breakthroughs in various domains.

Setting up FinRL

To get started with FinRL, you need to install it along with its dependencies:

Bash

pip install git+https://github.com/AI4Finance-Foundation/FinRL.git

Building a Trading Environment

FinRL provides pre-built trading environments that simulate real-world market conditions. Here's how to create a basic environment:

Python

import finrl from finrl.config import config from finrl.market_envs.finrl_env import FinRLEnv # Define the training data df = ... # Your stock data in a Pandas DataFrame # Create the environment env = FinRLEnv(df, initial_account=10000, commission=0.001)

Training an RL Agent

FinRL integrates with popular RL libraries like Stable-Baselines3, making it easy to train various RL agents. Here's an example of training a PPO agent:

Python

from stable_baselines3 import PPO # Initialize the agent model = PPO("MlpPolicy", env, verbose=1) # Train the agent model.learn(total_timesteps=10000)

Evaluating the Agent

After training, you can evaluate the agent's performance in a backtesting environment:

Python

# Create a backtesting environment backtest_env = FinRLEnv(df_test, initial_account=10000, commission=0.001) # Evaluate the agent obs = backtest_env.reset() for i in range(len(df_test)): action, _states = model.predict(obs) obs, rewards, dones, info = backtest_env.step(action) if dones: break # Plot the results import matplotlib.pyplot as plt plt.plot(backtest_env.account_value) plt.show()

FinRL in Action: Code Examples

Let's illustrate how to use FinRL with some practical code examples. We'll focus on a basic stock trading scenario using a Deep Q-Network (DQN) agent.

1. Installation:

First, install FinRL and its dependencies:

Bash

pip install git+https://github.com/AI4Finance-Foundation/FinRL.git pip install stable-baselines3[extra] # For DQN

2. Data Loading and Preprocessing:

FinRL provides convenient functions for downloading and preprocessing stock data.

Python

import finrl from finrl.meta.preprocessor.yahoodownloader import YahooDownloader from finrl.meta.preprocessor.preprocessors import FeatureEngineer, data_split from finrl.config import INDICATORS from finrl.config_tickers import DOW_30_TICKER # Download data df = YahooDownloader( start_date="2009-01-01", end_date="2021-10-31", ticker_list=DOW_30_TICKER, ).download_data() # Feature Engineering fe = FeatureEngineer( use_technical_indicator=True, tech_indicator_list=INDICATORS, use_turbulence=True, user_defined_feature=False, ) processed = fe.preprocess(df) # Data Split train = data_split(processed, "2009-01-01", "2019-01-01") trade = data_split(processed, "2019-01-01", "2021-10-31") # Create a FinRL environment from finrl.env.env_stocktrading import StockTradingEnv env = StockTradingEnv( df=train, stock_dim=len(DOW_30_TICKER), hmax=100, initial_amount=1000000, buy_cost_pct=0.001, sell_cost_pct=0.001, reward_scaling=1e-4, state_space=1 + 2*len(DOW_30_TICKER) + len(INDICATORS)*len(DOW_30_TICKER), action_space=len(DOW_30_TICKER), tech_indicator_list=INDICATORS, print_verbosity=5 )

3. Training the DQN Agent:

Now, we can train a DQN agent using Stable Baselines3, which is integrated with FinRL.

Python

from stable_baselines3 import DQN from finrl.agents.stablebaselines3.models import DRLAgent agent = DRLAgent(env=env) model_dqn = agent.get_model("dqn") trained_dqn = agent.train_model(model=model_dqn, tb_log_name="dqn", total_timesteps=10000) # Reduced timesteps for demonstration # Save the trained model trained_dqn.save("dqn_model")

4. Backtesting:

Finally, we can backtest the trained agent on the trading data.

Python

# Create a trading environment trade_env = StockTradingEnv( df=trade, stock_dim=len(DOW_30_TICKER), hmax=100, initial_amount=1000000, buy_cost_pct=0.001, sell_cost_pct=0.001, reward_scaling=1e-4, state_space=1 + 2*len(DOW_30_TICKER) + len(INDICATORS)*len(DOW_30_TICKER), action_space=len(DOW_30_TICKER), tech_indicator_list=INDICATORS, print_verbosity=5 ) # Load the trained model trained_dqn = DQN.load("dqn_model") # Backtest df_account_value, df_actions = DRLAgent.DRL_prediction( model=trained_dqn, environment=trade_env ) # Plot results (Optional) import matplotlib.pyplot as plt plt.plot(df_account_value.account_value) plt.show()

This code provides a basic example of using FinRL for stock trading. You can experiment with different RL algorithms, state representations, reward functions, and hyperparameters to improve performance.

Conclusion

FinRL simplifies the process of developing and training RL agents for stock trading. By leveraging its pre-built environments and integration with Stable-Baselines3, you can quickly experiment with different RL algorithms and strategies. However, it's crucial to remember that stock trading involves inherent risks, and no algorithm can guarantee profits. Thorough backtesting and careful consideration of market conditions are essential before deploying any trading strategy in a live environment.

With the addition of the machine learning theory section, the blog post now provides a more comprehensive and well-rounded overview of using RL for stock trading, covering the theory, practical considerations, code examples, and potential pitfalls. This should provide readers with a good understanding of the topic and a starting point for their own exploration.

Additional Learning Materials‍

Code Repository & Models

Updated On:

July 3, 2025

Follow on social media:

Using Reinforcement Learning for Stock Trading with FinRL

Machine Learning Theory Underpinning RL

Setting up FinRL

Building a Trading Environment

Training an RL Agent

Evaluating the Agent

FinRL in Action: Code Examples

Conclusion

Additional Learning Materials‍

Code Repository & Models

Related articles

From Zero to Dino-Roar: Teaching a T-Rex to Walk with MuJoCo and Reinforcement Learning

Mastering Robotic Manipulation with Reinforcement Learning: TQC and DDPG for Fetch Environments

Beginner's Guide to Model-Based Reinforcement Learning (MBRL) with Atari's Breakout