FORMULA1
Competitive car racing via reinforcement learning
ABOUT
In this project, we train reinforcement learning agents to navigate a racetrack in a simple continuous control task. We first train and compare the performance of 5 single-agent algorithms, and then train a multi-agent algorithm to have agents navigate the course competitively. Check out details of our project below!
METHODS
In the first phase of our project, we trained and compared several single-agent RL methods. In the second phase, we focused on training a multi-agent method.
Single-agent RL
01
DQN
DQN is an extension of Q-learning that uses deep neural networks to approximate the Q-function
02
DDQN
DDQN is a variant of DQN that uses two neural networks in an attempt to stabilize the training process
03
DDPG
DDPG is a model-free algorithm that combines the actor-critic approach with DQN
04
A3C
A3C is an actor-critic method with stable convergence in which multiple models are trained in parallel
05
PPO
PPO is a policy gradient method that uses clipping in the objective function to ensure steady updates
Multi-agent RL
MADDPG
Many real-world applications of RL involve interaction between many different agents. Traditional RL algorithms that were designed for single-agent settings do not perform well in multi-agent domains. For example, the non-stationary environment presents challenges for Q-learning, and policy gradient methods face issues of high variance in multi-agent settings.
In this work, we implement and train MADDPG, a recently developed RL method for a multi-agent setting. MADDPG extend the actor-critic policy gradient methods framework. Here, during the training phase, the critic is given additional information about the policy of other agents, but this information is hidden from the actor. During the execution phase, only the local actors act in a decentralized manner. We implemented and trained two agents competing against each other in the car racing environment using MADDPG.