JAFAR: Jack up Any Feature at Any Resolution

Paul Couairon*   Loick Chambon*    Louis Serrano   Jean-Emmanuel Haugeard   Matthieu Cord   Nicolas Thome

*Equal Contribution

under review 2025

Paper   Code    Website   


Abstract

Foundation Vision Encoders have become essential for a wide range of dense vision tasks. However, their low-resolution spatial feature outputs necessitate feature upsampling to produce the high-resolution modalities required for downstream tasks. In this work, we introduce JAFAR—a lightweight and flexible feature upsampler that enhances the spatial resolution of visual features from any Foundation Vision Encoder to an arbitrary target resolution. JAFAR employs an attention-based module designed to promote semantic alignment between high-resolution queries—derived from low-level image features—and semantically enriched low-resolution keys, using Spatial Feature Transform (SFT) modulation. Notably, despite the absence of high-resolution supervision, we demonstrate that learning at low upsampling ratios and resolutions generalizes remarkably well to significantly higher output scales. Extensive experiments show that JAFAR effectively recovers fine-grained spatial details and consistently outperforms existing feature upsampling methods across a diverse set of downstream tasks.


JAFAR improves metrics on many downstream tasks: semantic segmentation, depth estimation, feature activation, zero-shot open vocabulary, bird's eye view segmentation by upsampling features from any backbone.

Results

Once trained, JAFAR can efficiently upsample any backbone features to any resolution. Visually the features are sharper and more detailed than the original features leading to better performance on downstream tasks.

PCA visualization of features from various upsamplers.

Below we present results on various tasks, comparing JAFAR with other upsamplers such as Bilinear, FeatUp, and LiFT. More comparisons can be found in the paper.

Semantic Segmentation

We perform a linear probing on upsampled features from various upsamplers on many datasets: VOC, COCO, ADE20k, and Cityscapes. The results show that JAFAR consistently outperforms other methods across all datasets.

Linear probing results for semantic segmentation across various upsamplers.
Method COCO mIoU (↑) VOC mIoU (↑) ADE20k mIoU (↑) Cityscapes mIoU (↑)
Training-Free
Bilinear 59.03 80.70 39.23 59.37
Task-Agnostic
FeatUp 60.10 81.08 38.82 56.06
LiFT 58.18 78.06 38.73 58.75
JAFAR (ours) 60.78 (+1.75) 84.44 (+3.74) 40.49 (+1.26) 61.47 (+2.10)

Depth Estimation

For depth estimation, we evaluate the upsampled features using δ₁ and RMSE metrics. JAFAR again shows superior performance compared to other methods.

Linear probing results for depth estimation across various upsamplers.
Method δ₁ (↑) RMSE (↓)
Training-Free
Bilinear 59.92 0.66
Task-Agnostic
FeatUp 61.69 0.64
LiFT 57.04 0.70
JAFAR (ours) 62.18 (+2.26) 0.62 (-0.04)

Class Activation Maps

When evaluating Class Activation Maps (CAMs), JAFAR demonstrates improved alignment and granularity in the visualizations, indicating better feature representation.

Class Activation Map visualizations across various upsamplers.
Method A.D (↓) A.I (↑) A.G (↑) ADCC (↑)
Training-Free
Bilinear 19.0 18.5 3.4 61.7
Task-Agnostic
FeatUp 15.3 24.0 4.3 64.3
LiFT 66.9 8.7 2.3 53.0
JAFAR (ours) 17.4 (-1.6) 30.9 (+12.4) 6.5 (+3.1) 73.3 (+11.6)

Vehicle Segmentation

Even on more complicated tasks and pipelines, JAFAR shows significant improvements.

Vehicle segmentation in Bird's Eye View using DINOv2 + JAFAR.
Upsampling SimpleBeV mIoU (↑) PointBeV mIoU (↑) BeVFormer mIoU (↑)
Training-Free
Low-Res 31.75 34.89 33.72
Bilinear 33.67 36.01 34.18
Task-Agnostic
FeatUp 33.95 35.38 34.01
JAFAR (ours) 36.59 (+2.92) 37.20 (+1.19) 36.54 (+2.36)

Zero-shot Open Vocabulary

We also clearly see improvements in zero-shot open vocabulary tasks.

Upsampling VOC mIoU (↑) ADE mIoU (↑) City mIoU (↑)
Training-Free
Bilinear 27.87 11.03 21.56
Task-Agnostic
FeatUp 32.27 13.03 24.76
JAFAR (ours) 35.70 (+7.83) 13.61 (+2.58) 25.26 (+3.70)

BibTeX

@inproceedings{couairon2025jafar,
      title={JAFAR: Jack up Any Feature at Any Resolution}, 
      author={Paul Couairon and Loick Chambon and Louis Serrano and Jean-Emmanuel Haugeard and Matthieu Cord and Nicolas Thome},
      year={2025},
      url={https://jafar-upsampler.github.io}, 
}