Anh-Quan Cao

I am a Research Scientist at Valeo.ai, working on 3D perception for autonomous driving.

I completed my PhD in the ASTRA-Vision group at Inria, under the guidance of Raoul de Charette. Prior to this, I earned an MSc in Artificial Intelligence & Advanced Visual Computing from École Polytechnique, an MSc in Data Science from University of Paris-Saclay, and a BSc in Computer Science from University of Science and Technology of Hanoi (USTH).

Email Scholar Github

News

03/2025 Outstanding reviewer award at WACV 2025.

01/2025 Start a new position as a Research Scientist at Valeo.ai.

12/2024 I defended my PhD thesis.

09/2024 Outstanding reviewer award at ECCV 2024.

05/2024 PaSCo is selected by CVPR 2024 as best paper award candidate.

05/2024 Outstanding reviewer award at CVPR 2024.

04/2024 PaSCo is accepted by CVPR 2024 as Oral (0.8% = 90/11,532).

02/2024 I join Amazon as an Applied Research Intern, working with Maximilian Jaritz, Matthieu Guillaumin and Loris Bazzani.

Publications

LatteCLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts

Anh-Quan Cao, Maximilian Jaritz, Matthieu Guillaumin, Raoul de Charette, Loris Bazzani

WACV 2025

LatteCLIP is an unsupervised method to fine-tune CLIP models for specific domains without human labels. It uses Large Multimodal Models (LMMs) to generate image descriptions and a novel distillation strategy to overcome the inaccuracies of the generated descriptions.

arXiv Code

PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness

Anh-Quan Cao, Angela Dai, Raoul de Charette

CVPR 2024 (Oral, Best Paper Award Candidate)

PaSCo introduces Panoptic Scene Completion (PSC), adding instance details to Semantic Scene Completion (SSC). It uses a hybrid mask-based CNN-transformer and MIMO-based ensembling for voxel and instance uncertainty estimation.

Project Page arXiv Code

SceneRF: Self-Supervised Monocular 3D Scene Reconstruction with Radiance Fields

Anh-Quan Cao, Raoul de Charette

ICCV 2023

SceneRF is a self-supervised 3D reconstruction method using NeRF and monocular image sequences. It improves geometry with new constraints and a novel sampling strategy, fusing depth views from a spherical decoder for a wider field of view.

Project Page arXiv Code

COARSE3D: Class-Prototypes for Contrastive Learning in Weakly-Supervised 3D Point Cloud Segmentation

Rong Li, Anh-Quan Cao, Raoul de Charette

BMVC 2022

COARSE3D is an architecture-agnostic contrastive learning method for 3D segmentation requiring minimal annotations. It proposes a prototype memory bank and entropy-driven sampling to achieve state-of-the-art results on outdoor datasets with minimal annotations (down to 0.001%).

arXiv Code

MonoScene: Monocular 3D Semantic Scene Completion

Anh-Quan Cao, Raoul de Charette

CVPR 2022

MonoScene infers 3D geometry and semantics from a single RGB image. It combines 2D and 3D UNets with a novel 2D-to-3D feature projection and a 3D context prior for spatio-semantic consistency. New global scene and local frustum losses enhance performance, achieving state-of-the-art results while hallucinating plausible scenes beyond the camera’s view.

Project Page arXiv Code

PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point Clouds

Anh-Quan Cao, Gilles Puy, Alexandre Boulch, Renaud Marlet

ICCV 2021

PCAM is a deep learning method for rigid point cloud registration with partial overlaps, jointly solving correspondence finding and filtering. It leverages a pointwise product of cross-attention matrices to integrate geometric and contextual information, enhancing feature matching in overlapping regions.

arXiv Code

Academic services

Conference:

2025: WACV, AAAI, CVPR, ICLR
2024: CVPR (Outstanding Reviewer Award), ECCV (Outstanding Reviewer Award), ACCV
2023: WACV, CVPR, ICCV

Journal:

2024: TGCV, TPAMI
2023: Pattern Recognition, ACM MM