Our work spans four intertwined axes. Click any axis to browse matching publications.
Autonomous vehicles rely on cameras, LiDARs, radars, and ultrasonics to perceive their surroundings. We study how to fuse and interpret these multi-modal signals to build accurate 3D representations of the driving scene — object detection, semantic segmentation, depth estimation, motion forecasting, and pose estimation.
Large-scale pretrained models can go beyond fixed ontologies and adapt to a wide variety of downstream tasks. We investigate vision and vision-language foundation models, as well as world models that learn to simulate and predict how driving scenes evolve over time — enabling better generalization with less task-specific supervision.
Rather than treating perception and decision-making as separate modules, end-to-end approaches learn to map sensor inputs directly to driving actions. We explore neural planning architectures and physical AI methods that reason jointly about scene understanding and trajectory planning — aiming for driving systems that are both simpler and more effective.
Safety-critical applications demand models that are resilient to distribution shifts, adverse conditions, and unexpected inputs. We work on uncertainty estimation, domain generalization, robustness to corruptions and adversarial perturbations, and explainability methods that help understand and trust the decisions made by deep learning systems.