LOSC: LiDAR Open-voc Segmentation Consolidator

Nermin Samet Gilles Puy Renaud Marlet

3DV 2026

Abstract

We study the use of image-based Vision-Language Models (VLMs) for open-vocabulary segmentation of lidar scans in driving settings. Classically, image semantics can be back-projected onto 3D point clouds. Yet, resulting point labels are noisy and sparse. We consolidate these labels to enforce both spatiotemporal consistency and robustness to image-level augmentations. We then train a 3D network based on these refined labels. This simple method, called LOSC, outperforms the SOTA of zero-shot open-vocabulary semantic and panoptic segmentation on both nuScenes and SemanticKITTI, with significant margins

BibTeX

@inproceedings{losc,
      title={LOSC: LiDAR Open-voc Segmentation Consolidator}, 
      author={Nermin Samet, Gilles Puy, Renaud Marlet}, 
      journal = {3DV},
      year={2026},
}