SUMO Ampelphasen · Fabian Egenberger

Role: Research, implementation, evaluation
Timeline: DHBW · 2nd semester · ongoing 2026
Project type: University project
Status: Work in progress

Context

Classical traffic light control runs on a rigid schedule. Under changing load this means unnecessary waiting and uneven throughput.

Reinforcement learning is an attractive alternative here. An agent observes queue lengths and phase, selects an action, and learns from the reward.

Approach

The simulation runs in SUMO, an open-source traffic simulator, on a simple four-arm intersection. The agent is a Deep Q-Network, trained with Stable-Baselines3, and decides between Phase A and Phase B at each step.

Reward combines three signals, negative for wait time, positive for throughput, with a penalty on frequent phase switching. Training runs offline on episode snapshots, evaluation against a fixed-time baseline.

Status

After roughly 4000 episodes the agent sits about 12 percent above the baseline on mean reward. The next phase is the interesting one, multiple intersections and a multi-agent setup with shared observations. The goal is not the perfect algorithm, it is a fully documented learning project.

Stack

Python 3.12
PyTorch
Stable-Baselines3
SUMO
TraCI
Matplotlib

← All projects

Role: Research, implementation, evaluation
Timeline: DHBW · 2nd semester · ongoing 2026
Project type: University project
Status: Work in progress