From Zero to Dino-Roar: Teaching a T-Rex to Walk with MuJoCo and Reinforcement Learning

Michael Kudlaty
Michael Kudlaty
March 1, 2025

Ever wanted to bring a T-Rex back to life?  Well, digitally at least!  In this post, we'll explore how to create a simplified T-Rex model in MuJoCo, a powerful physics simulator, and then lay the groundwork for using reinforcement learning (RL) to teach it how to walk.  This isn't just a fun project; it's a fantastic way to dive into the world of robotics, simulation, and AI.

Step 1: Building Our Digital Dino (The MuJoCo Model)

MuJoCo uses XML files to define models.  We'll start with a T-Rex model designed for bipedal (two-legged) locomotion.  This is crucial because it gives our digital dinosaur the potential to walk.  Here's the XML (save it as, for example, trex.xml):

XML

<mujoco model="T-Rex">
 <compiler angle="degree" coordinate="local" inertiafromgeom="true"/>

 <default>
   <joint limited="true" damping="1" armature="0.1"/>
   <geom conaffinity="1" condim="3" friction="1 0.5 0.5" margin="0.01" solref="0.02 1"/>
 </default>

 <worldbody>
   <light directional="true" diffuse="0.8 0.8 0.8" pos="0 0 5" dir="0 0 -1"/>
   <geom type="plane" size="50 50 0.1" rgba="0.2 0.3 0.4 1"/>

   <body name="torso" pos="0 0 1.5">  <camera name="track" mode="track" pos="0 -5 2" xyaxes="1 0 0 0 1 0"/>
     <joint type="free"/> <geom name="torso_geom" type="capsule" fromto="0 0 0 0 0 -0.7" size="0.3"/> <body name="left_thigh" pos="0.2 0 -0.7">
       <joint name="left_hip" type="hinge" pos="0 0 0" axis="0 1 0" range="-90 45"/>
       <geom name="left_thigh_geom" type="capsule" fromto="0 0 0 0 0 -0.5" size="0.15"/>

       <body name="left_shin" pos="0 0 -0.5">
         <joint name="left_knee" type="hinge" pos="0 0 0" axis="0 1 0" range="-160 0"/>
         <geom name="left_shin_geom" type="capsule" fromto="0 0 0 0 0 -0.6" size="0.1"/>

         <body name="left_foot" pos="0 0 -0.6">
             <joint name="left_ankle" type="hinge" pos="0 0 0" axis="0 1 0" range="-45 45"/>
             <geom name="left_foot_geom" type="capsule" fromto="0 0 0 0.2 0 -0.1" size="0.08"/>
             <body name="left_toe" pos="0.2 0 -0.1">
                 <joint name="left_toe_joint" type="hinge" axis="0 1 0" range="-20 45"/>
                 <geom name="left_toe_geom" type="capsule" fromto="0 0 0 0.2 0 0" size="0.06"/>
             </body>
         </body>
       </body>
     </body>

     <body name="right_thigh" pos="-0.2 0 -0.7">
       <joint name="right_hip" type="hinge" pos="0 0 0" axis="0 1 0" range="-90 45"/>
       <geom name="right_thigh_geom" type="capsule" fromto="0 0 0 0 0 -0.5" size="0.15"/>

       <body name="right_shin" pos="0 0 -0.5">
         <joint name="right_knee" type="hinge" pos="0 0 0" axis="0 1 0" range="-160 0"/>
         <geom name="right_shin_geom" type="capsule" fromto="0 0 0 0 0 -0.6" size="0.1"/>

         <body name="right_foot" pos="0 0 -0.6">
           <joint name="right_ankle" type="hinge" pos="0 0 0" axis="0 1 0" range="-45 45"/>
           <geom name="right_foot_geom" type="capsule" fromto="0 0 0 0.2 0 -0.1" size="0.08"/>
           <body name="right_toe" pos="0.2 0 -0.1">
                 <joint name="right_toe_joint" type="hinge" axis="0 1 0" range="-20 45"/>
                 <geom name="right_toe_geom" type="capsule" fromto="0 0 0 0.2 0 0" size="0.06"/>
             </body>
         </body>
       </body>
     </body>

     <body name="left_upper_arm" pos="0.3 0 -0.2">
           <joint name="left_shoulder" type="hinge" pos="0 0 0" axis="0 1 0" range="-45 90"/>
           <geom name="left_upper_arm_geom" type="capsule" fromto="0 0 0 0 0 -0.3" size="0.07"/>
           <body name="left_lower_arm" pos="0 0 -.3">
               <joint name="left_elbow" type="hinge" axis="0 1 0" range="0 90"/>
               <geom type="capsule" fromto="0 0 0 0 0 -0.3" size="0.05"/>
           </body>
       </body>

       <body name="right_upper_arm" pos="-0.3 0 -0.2">
           <joint name="right_shoulder" type="hinge" pos="0 0 0" axis="0 1 0" range="-45 90"/>
           <geom name="right_upper_arm_geom" type="capsule" fromto="0 0 0 0 0 -0.3" size="0.07"/>
           <body name="right_lower_arm" pos="0 0 -.3">
               <joint name="right_elbow" type="hinge" axis="0 1 0" range="0 90"/>
               <geom type="capsule" fromto="0 0 0 0 0 -0.3" size="0.05"/>
           </body>
       </body>

       <body name="neck" pos="0 0 -0.1">  <joint name="neck_joint" type="hinge" axis="0 1 0" range="-45 45"/>
             <geom name="neck_geom" type="capsule" fromto="0 0 0 0 0 -0.4" size="0.1"/>

             <body name="head" pos="0 0 -0.4">
               <joint name="head_joint" type="hinge" axis="0 1 0" range="-20 20"/>
               <geom name="head_geom" type="sphere" size="0.2"/>  </body>
       </body>
       <body name="tail_base" pos="0 0 -0.7">
           <joint name="tail1" type="hinge" axis="0 1 0" range="-20 20"/>
           <geom name="tail_base_geom" type="capsule" fromto="0 0 0 0 0 -0.4" size="0.2"/>

           <body name="tail_mid" pos="0 0 -0.4">
               <joint name="tail2" type="hinge" axis="0 1 0" range="-15 15" />
               <geom name="tail_mid_geom" type="capsule" fromto="0 0 0 0 0 -0.4" size="0.15"/>

               <body name="tail_end" pos="0 0 -0.4">
                 <joint name="tail3" type="hinge" axis="0 1 0" range="-10 10"/>
                 <geom name="tail_end_geom" type="capsule" fromto="0 0 0 0 0 -0.4" size="0.1"/>
               </body>
           </body>
       </body>

   </body>
 </worldbody>

<actuator>
   <motor joint="left_hip" gear="50" ctrlrange="-90 45" ctrllimited="true"/>
   <motor joint="left_knee" gear="50" ctrlrange="-160 0" ctrllimited="true"/>
   <motor joint="left_ankle" gear="30" ctrlrange="-45 45" ctrllimited="true"/>
   <motor joint="left_toe_joint" gear="20" ctrlrange="-20 45" ctrllimited="true"/>

   <motor joint="right_hip" gear="50" ctrlrange="-90 45" ctrllimited="true"/>
   <motor joint="right_knee" gear="50" ctrlrange="-160 0" ctrllimited="true"/>
   <motor joint="right_ankle" gear="30" ctrlrange="-45 45" ctrllimited="true"/>
   <motor joint="right_toe_joint" gear="20" ctrlrange="-20 45" ctrllimited="true"/>

   <motor joint="left_shoulder" gear="20" ctrlrange="-45 90" ctrllimited="true"/>
   <motor joint="left_elbow" gear="15" ctrlrange="0 90" ctrllimited="true"/>
   <motor joint="right_shoulder" gear="20" ctrlrange="-45 90" ctrllimited="true"/>
   <motor joint="right_elbow" gear="15" ctrlrange="0 90" ctrllimited="true"/>

   <motor joint="neck_joint" gear="20" ctrlrange="-45 45" ctrllimited="true"/>
   <motor joint="head_joint" gear="10" ctrlrange="-20 20" ctrllimited="true"/>

   <motor joint="tail1" gear="30" ctrlrange="-20 20" ctrllimited="true"/>
    <motor joint="tail2" gear="20" ctrlrange="-15 15" ctrllimited="true"/>
    <motor joint="tail3" gear="10" ctrlrange="-10 10" ctrllimited="true"/>
 </actuator>
</mujoco>

Let's break down the key parts:

  • <mujoco model="T-Rex">:  The root element, giving our model a name.
  • <compiler ...>:  Sets some global options for the simulation.  inertiafromgeom="true" is generally a good idea.
  • <default> ... </default>: Defines default properties for joints and geometries, making the XML less repetitive.
  • <worldbody> ... </worldbody>: Contains everything that's visually part of the simulation.
    • <light ...>: Adds a light source for better visualization.
    • <geom type="plane" ...>:  Creates the ground.
    • <body> ... </body>:  Defines a body part.  Bodies are nested to create a hierarchical structure (e.g., the thigh is the parent of the shin).
      • joint: Defines how a body part can move relative to its parent.  type="hinge" simulates joints like knees and elbows.  range limits the joint's motion – very important for realistic movement. The free joint for the torso allows for translation and rotation.
      • geom:  Defines the shape of the body part.  type="capsule" is good for limbs, and type="sphere" is used for the head.
  • <actuator> ... </actuator>:  This is where we define the "muscles" of our T-Rex.  motor creates a motor that applies force to a joint. gear simulates a gear ratio. ctrlrange and ctrllimited are used to restrict the control inputs of the motor.

Step 2: Setting up the MuJoCo Environment (Python)

We'll use Python and the mujoco library to interact with our model.  First, install MuJoCo:

Bash

pip install mujoco

Now, let's write a basic script to load and visualize the model:

Python

import mujoco
import numpy as np
import mediapy as media  # For displaying the simulation

# Load the XML model
model = mujoco.MjModel.from_xml_path("trex.xml")
data = mujoco.MjData(model)

# Create a renderer
renderer = mujoco.Renderer(model)

# Simulation loop
frames = []
mujoco.mj_resetData(model, data)  # Reset the simulation
while data.time < 5:  # Run for 5 seconds
   mujoco.mj_step(model, data)  # Advance the simulation by one step

   # Render the scene
   renderer.update_scene(data)
   frames.append(renderer.render())

# Display the simulation as a video
media.show_video(frames, fps=1/model.opt.timestep)
print("Simulation Complete")

This script does the following:

  1. Loads the model: mujoco.MjModel.from_xml_path("trex.xml") loads your XML file.
  2. Creates data: mujoco.MjData(model) creates a data structure to store the simulation state.
  3. Creates Renderer: This will be used to render and save each frame of the simulation.
  4. Simulation Loop:
    • mujoco.mj_resetData(model, data): Resets the simulation to its initial state.
    • mujoco.mj_step(model, data):  This is the core function.  It advances the physics simulation by one timestep.
    • renderer.update_scene(data) and renderer.render(): Renders the scene from the default camera.
    • frames.append(...): Appends current frame to frames array to create video
  5. Display Video: Uses mediapy to display the video using the collected frames.

Run this script. You should see your T-Rex model in MuJoCo, but it will just fall to the ground – we haven't taught it to walk yet!

Step 3:  The Reinforcement Learning Magic

This is where things get exciting (and challenging).  Reinforcement learning (RL) is a type of machine learning where an "agent" (our T-Rex) learns to interact with an "environment" (the MuJoCo simulation) to achieve a goal (walking).  The agent learns by trial and error, receiving "rewards" for actions that get it closer to the goal and "penalties" for actions that don't.

Here's a high-level overview of the RL process:

  1. Observation: The agent observes the current state of the environment.  This might include the positions and velocities of all its joints, and contact forces.
  2. Action: Based on its observation, the agent chooses an action.  In our case, this means setting the target angles (or torques) for each of the T-Rex's motors.
  3. Reward: The environment provides a reward based on how good the action was.  A good reward function for walking might include:
    • Positive reward for forward movement.
    • Small negative reward for using too much energy (to encourage efficiency).
    • Large negative reward for falling over.
  4. State Transition: The environment updates its state based on the agent's action.  This is where MuJoCo's physics engine comes in.
  5. Repeat: Steps 1-4 are repeated until the agent learns a good policy (a strategy for choosing actions).

Choosing an RL Algorithm

Several RL algorithms could be used, including:

  • Proximal Policy Optimization (PPO):  A popular and relatively stable algorithm.  It's a good starting point.
  • Soft Actor-Critic (SAC):  Another popular choice, often good for continuous action spaces (like our motor controls).
  • TD3 (Twin Delayed Deep Deterministic Policy Gradient):  A good option for continuous control tasks.

Additional Learning Materials

Code Repository & Models

Updated On:
March 1, 2025
Follow on social media: