Baselines
Demonstration Against Human Experts
Random Action Agent
Our RandomActionAgent
is a basic demonstration of an agent interacting with the environment which we explain in the Getting Started section of the docs. This agent simply takes random actions in the environment with an action space that is restricted to only take non-negative acceleration values.
Usage
As mentioned previously, you need to have the docker image with the simulator to use the environment. Simply add the -b
flag and argument random
as a command line argument to ./run.bash
to use this agent.
$ chmod +x run.bash # make our script executable
$ ./run.bash -b random
Soft Actor-Critic
We also provide a more detailed demonstration of how to use the environment with our Soft Actor-Critic agent using OpenAI’s Spinning Up Pytorch implementation with minor adjustments. Specifically these adjustments include wrapping methods which returned observations from the environment to first encode the raw images into a latent representation, waiting until the end of the episode to make gradient updates, and removing unused functionality.
Training Performance
For both tracks, we provide our agent’s model after 1000 episodes which was slightly less than 1 million environment steps for the Las Vegas Track and slightly more for the Thruxton track.
Las Vegas Track
Our agent is able to consistently, but not always, complete laps in under 2 minutes each.
Thruxton
The agent demonstrates control but fails to ever complete a lap due to the trap near the end of the course which requires multiple sharp turns.
Evaluation Performance
The SAC agent struggles when transferring its learned experience from the Thruxton track to the Las Vegas evaluation track even after 60 minutes of exploration as it learns to simply stop in the middle of the track to avoid the penalty of going out-of-bounds.
Usage
To run the trained model, simply provide -b
flag and argument sac
to run.bash
. Both the encoder and checkpoint models were trained separately for each track, so if you would like to switch to the Thruxton track, be sure to change the encoder and checkpoint paths in configs/params_sac.yaml
in addition to the track name.
$ chmod +x run.bash # make our script executable
$ ./run.bash -b sac
Vision-Only Perception & Control
This agent learns non-trivial control of the race car exlusively from visual features. First, we pretrained a variational autoencoder on the provided sample image datasets to allow our agent to learn from a low-dimensional representation of the images. Our VAE is a slight modification of Shubham Chandel’s implementation.
Restriction of the Action Space
For this agent, we restricted the scaled action space to [-0.1, 4.0]
for acceleration and [-0.3, 0.3]
for steering to allow for faster convergence.
Custom Reward Policy
Additionally, we modified the default reward policy for the environment to include bonus if the agent is near the center of the track for each step in the environment but only if it had made progress down the track. Doing so has numerous consequences including:
encouraging the agent to safely stay near the middle of the track
disincentivizing the agent from engaging in corner cutting
implicitly rewarding the agent to drive more slowly
As such, this reward allows for faster convergence in terms of number of episodes before completing its first lap in the environment. However, we noticed that the agent learns to zig-zag; we believe this may be an intentional effort to slow down and gather more near-center bonuses.
Model Predictive Control
We include a model predictive control (MPC), non-learning agent with the environment too. This reference implementation demonstrates a controller which attempts to minimize tracking error with respect to the centerline of the racetrack at a pre-specified reference speed.
Performance
The MPC agent does well, completing laps consistently, on the Thruxton track by following a conservative trajectory. On the LVMS track, however, it seems to occasionally falter on the highest curvature points of the track.
Usage
To run the trained model, simply provide the -b
flag and argument mpc
to run.bash
. Do note, however, that the MPC requires torch<=1.4
unlike the SAC baseline.
$ chmod +x run.bash # make our script executable
$ ./run.bash -b mpc