From Zero to Dino-Roar: Teaching a T-Rex to Walk with MuJoCo and Reinforcement Learning

Michael Kudlaty



March 1, 2025

Ever wanted to bring a T-Rex back to life? Well, digitally at least! In this post, we'll explore how to create a simplified T-Rex model in MuJoCo, a powerful physics simulator, and then lay the groundwork for using reinforcement learning (RL) to teach it how to walk. This isn't just a fun project; it's a fantastic way to dive into the world of robotics, simulation, and AI.

Step 1: Building Our Digital Dino (The MuJoCo Model)

MuJoCo uses XML files to define models. We'll start with a T-Rex model designed for bipedal (two-legged) locomotion. This is crucial because it gives our digital dinosaur the potential to walk. Here's the XML (save it as, for example, trex.xml):

XML

Let's break down the key parts:

<mujoco model="T-Rex">: The root element, giving our model a name.
<compiler ...>: Sets some global options for the simulation. inertiafromgeom="true" is generally a good idea.
<default> ... </default>: Defines default properties for joints and geometries, making the XML less repetitive.
<worldbody> ... </worldbody>: Contains everything that's visually part of the simulation.
- <light ...>: Adds a light source for better visualization.
- <geom type="plane" ...>: Creates the ground.
- <body> ... </body>: Defines a body part. Bodies are nested to create a hierarchical structure (e.g., the thigh is the parent of the shin).
  - joint: Defines how a body part can move relative to its parent. type="hinge" simulates joints like knees and elbows. range limits the joint's motion – very important for realistic movement. The free joint for the torso allows for translation and rotation.
  - geom: Defines the shape of the body part. type="capsule" is good for limbs, and type="sphere" is used for the head.
<actuator> ... </actuator>: This is where we define the "muscles" of our T-Rex. motor creates a motor that applies force to a joint. gear simulates a gear ratio. ctrlrange and ctrllimited are used to restrict the control inputs of the motor.

Step 2: Setting up the MuJoCo Environment (Python)

We'll use Python and the mujoco library to interact with our model. First, install MuJoCo:

Bash

pip install mujoco

Now, let's write a basic script to load and visualize the model:

Python

import mujoco import numpy as np import mediapy as media # For displaying the simulation # Load the XML model model = mujoco.MjModel.from_xml_path("trex.xml") data = mujoco.MjData(model) # Create a renderer renderer = mujoco.Renderer(model) # Simulation loop frames = [] mujoco.mj_resetData(model, data) # Reset the simulation while data.time < 5: # Run for 5 seconds mujoco.mj_step(model, data) # Advance the simulation by one step # Render the scene renderer.update_scene(data) frames.append(renderer.render()) # Display the simulation as a video media.show_video(frames, fps=1/model.opt.timestep) print("Simulation Complete")

This script does the following:

Loads the model: mujoco.MjModel.from_xml_path("trex.xml") loads your XML file.
Creates data: mujoco.MjData(model) creates a data structure to store the simulation state.
Creates Renderer: This will be used to render and save each frame of the simulation.
Simulation Loop:
- mujoco.mj_resetData(model, data): Resets the simulation to its initial state.
- mujoco.mj_step(model, data): This is the core function. It advances the physics simulation by one timestep.
- renderer.update_scene(data) and renderer.render(): Renders the scene from the default camera.
- frames.append(...): Appends current frame to frames array to create video
Display Video: Uses mediapy to display the video using the collected frames.

Run this script. You should see your T-Rex model in MuJoCo, but it will just fall to the ground – we haven't taught it to walk yet!

Step 3: The Reinforcement Learning Magic

This is where things get exciting (and challenging). Reinforcement learning (RL) is a type of machine learning where an "agent" (our T-Rex) learns to interact with an "environment" (the MuJoCo simulation) to achieve a goal (walking). The agent learns by trial and error, receiving "rewards" for actions that get it closer to the goal and "penalties" for actions that don't.

Here's a high-level overview of the RL process:

Observation: The agent observes the current state of the environment. This might include the positions and velocities of all its joints, and contact forces.
Action: Based on its observation, the agent chooses an action. In our case, this means setting the target angles (or torques) for each of the T-Rex's motors.
Reward: The environment provides a reward based on how good the action was. A good reward function for walking might include:
- Positive reward for forward movement.
- Small negative reward for using too much energy (to encourage efficiency).
- Large negative reward for falling over.
State Transition: The environment updates its state based on the agent's action. This is where MuJoCo's physics engine comes in.
Repeat: Steps 1-4 are repeated until the agent learns a good policy (a strategy for choosing actions).

Choosing an RL Algorithm

Several RL algorithms could be used, including:

Proximal Policy Optimization (PPO): A popular and relatively stable algorithm. It's a good starting point.
Soft Actor-Critic (SAC): Another popular choice, often good for continuous action spaces (like our motor controls).
TD3 (Twin Delayed Deep Deterministic Policy Gradient): A good option for continuous control tasks.

Additional Learning Materials‍

Code Repository & Models

Updated On:

July 2, 2025

Follow on social media:

From Zero to Dino-Roar: Teaching a T-Rex to Walk with MuJoCo and Reinforcement Learning

Additional Learning Materials‍

Code Repository & Models

Related articles

Mastering Robotic Manipulation with Reinforcement Learning: TQC and DDPG for Fetch Environments

Ultimate Guide to Contextual Bandits: From Theory to Python Implementation

Beginner's Guide to Model-Based Reinforcement Learning (MBRL) with Atari's Breakout