- Role
- Research, implementation, evaluation
- Timeline
- DHBW · 2nd semester · ongoing 2026
- Project type
- University project
- Status
- Work in progress
Context
Classical traffic light control runs on a rigid schedule. Under changing load this means unnecessary waiting and uneven throughput.
Reinforcement learning is an attractive alternative here. An agent observes queue lengths and phase, selects an action, and learns from the reward.
Approach
The simulation runs in SUMO, an open-source traffic simulator, on a simple four-arm intersection. The agent is a Deep Q-Network, trained with Stable-Baselines3, and decides between Phase A and Phase B at each step.
Reward combines three signals, negative for wait time, positive for throughput, with a penalty on frequent phase switching. Training runs offline on episode snapshots, evaluation against a fixed-time baseline.
Status
After roughly 4000 episodes the agent sits about 12 percent above the baseline on mean reward. The next phase is the interesting one, multiple intersections and a multi-agent setup with shared observations. The goal is not the perfect algorithm, it is a fully documented learning project.
Stack
- Python 3.12
- PyTorch
- Stable-Baselines3
- SUMO
- TraCI
- Matplotlib