Publications
2023
- SceneRF: Self-Supervised Monocular 3D Scene Reconstruction with Radiance FieldsCao, Anh-Quan, and Charette, RaoulIn ICCV 2023
In the literature, 3D reconstruction from 2D image has been extensively addressed but often still requires geometrical supervision. In this paper, we propose a self-supervised monocular scene reconstruction method with neural radiance fields (NeRF) learned from multiple image sequences with pose. To improve geometry prediction, we introduce new geometry constraints and a novel probabilistic sampling strategy that efficiently update radiance fields. As the latter are conditioned on a single frame, scene reconstruction is achieved from the fusion of multiple synthetized novel depth views. This is enabled by our spherical-decoder which allows hallucination beyond the input frame field of view. Thorough experiments demonstrate that we outperform all baselines on all metrics for novel depth views synthesis and scene reconstruction.
@inproceedings{cao2022scenerf, title = {SceneRF: Self-Supervised Monocular 3D Scene Reconstruction with Radiance Fields}, author = {Cao, Anh-Quan and de Charette, Raoul}, booktitle = {ICCV}, year = {2023}, }
2022
- COARSE3D: Class-Prototypes for Contrastive Learning in Weakly-Supervised 3D Point Cloud SegmentationLi, Rong, Cao, Anh-Quan, and Charette, RaoulIn BMVC 2022
Annotation of large-scale 3D data is notoriously cumbersome and costly. As an alternative, weakly-supervised learning alleviates such a need by reducing the annotation by several order of magnitudes. We propose a novel architecture-agnostic contrastive learning strategy for 3D segmentation. Since contrastive learning requires rich and diverse examples as keys and anchors, we propose a prototype memory bank capturing class-wise global dataset information efficiently into a small number of prototypes acting as keys. An entropy-driven sampling technique then allows us to select good pixels from predictions as anchors. Experiments using a light-weight projection-based backbone show we outperform baselines on three challenging real-world outdoor datasets, working with as low as 0.001% annotations.
@inproceedings{rong2022coarse3d, title = {COARSE3D: Class-Prototypes for Contrastive Learning in Weakly-Supervised 3D Point Cloud Segmentation}, author = {Li, Rong and Cao, Anh-Quan and de Charette, Raoul}, booktitle = {BMVC}, year = {2022}, }
- MonoScene: Monocular 3D Semantic Scene CompletionCao, Anh-Quan, and Charette, RaoulIn CVPR 2022
MonoScene proposes a 3D Semantic Scene Completion (SSC) framework, where the dense geometry and semantics of a scene are inferred from a single monocular RGB image. Different from the SSC literature, relying on 2.5 or 3D input, we solve the complex problem of 2D to 3D scene reconstruction while jointly inferring its semantics. Our framework relies on successive 2D and 3D UNets bridged by a novel 2D-3D features projection inspiring from optics and introduces a 3D context relation prior to enforce spatio-semantic consistency. Along with architectural contributions, we introduce novel global scene and local frustums losses. Experiments show we outperform the literature on all metrics and datasets while hallucinating plausible scenery even beyond the camera field of view. Our code and trained models are available at https://astra-vision.github.io/MonoScene/.
@inproceedings{cao2022monoscene, title = {MonoScene: Monocular 3D Semantic Scene Completion}, author = {Cao, Anh-Quan and de Charette, Raoul}, booktitle = {CVPR}, year = {2022}, }
2021
- PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point CloudsCao, Anh-Quan, Puy, Gilles, Boulch, Alexandre, and Marlet, RenaudIn ICCV 2021
Rigid registration of point clouds with partial overlaps is a longstanding problem usually solved in two steps: (a) finding correspondences between the point clouds; (b) filtering these correspondences to keep only the most reliable ones to estimate the transformation. Recently, several deep nets have been proposed to solve these steps jointly. We built upon these works and propose PCAM: a neural network whose key element is a pointwise product of cross-attention matrices that permits to mix both low-level geometric and high-level contextual information to find point correspondences. A second key element is the exchange of information between the point clouds at each layer, allowing the network to exploit context information from both point clouds to find the best matching point within the overlapping regions. The experiments show that PCAM achieves state-of-the-art results among methods which, like us, solve steps (a) and (b) jointly via deepnets.
@inproceedings{cao21pcam, title = {{PCAM}: {P}roduct of {C}ross-{A}ttention {M}atrices for {R}igid {R}egistration of {P}oint {C}louds}, author = {Cao, Anh-Quan and Puy, Gilles and Boulch, Alexandre and Marlet, Renaud}, booktitle = {ICCV}, year = {2021}, }