MACARONS: Mapping And Coverage Anticipation with RGB ONline Self-supervision

CVPR 2023

Antoine GuédonTom MonnierPascal MonasseVincent Lepetit

LIGM, Ecole des Ponts, Univ Gustave Eiffel, CNRS


(a) NBV methods with a depth sensor (e.g., SCONE by Guédon et al.)


(b) Our approach MACARONS with an RGB sensor


We introduce a method that simultaneously learns to explore new large environments and to reconstruct them in 3D from color images in a self-supervised fashion. This is closely related to the Next Best View problem (NBV), where one has to identify where to move the camera next to improve the coverage of an unknown scene. However, most of the current NBV methods rely on depth sensors, need 3D supervision and/or do not scale to large scenes.

In this paper, we propose the first deep-learning-based NBV approach for dense reconstruction of large 3D scenes from RGB images. We call this approach MACARONS, for Mapping And Coverage Anticipation with RGB Online Self-Supervision. Moreover, we provide a dedicated training procedure for online learning for scene mapping and automated exploration based on coverage optimization in any kind of environment, with no explicit 3D supervision. Consequently, our approach is also the first NBV method to learn in real-time to reconstruct and explore arbitrarily large scenes in a self-supervised fashion.

Indeed, MACARONS simultaneously learns to predict a "volume occupancy field" from color images and, from this field, to predict the NBV. We experimentally show that this greatly improves results for NBV exploration of 3D scenes. In particular, we demonstrate this on a recent dataset made of various 3D scenes and show it performs even better than recent methods requiring a depth sensor, which is not a realistic assumption for outdoor scenes captured with a flying drone. It makes our approach suitable for real-life applications on small drones with a simple color camera. More fundamentally, it shows that an autonomous system can learn to explore and reconstruct environments without any 3D information a priori.


This video illustrates how MACARONS explores and reconstructs efficiently a subset of three large 3D scenes. In particular, the video shows several key-elements of our approach:

  1. The trajectory of the camera, evolving in real-time with each NBV iteration performed by the surface coverage gain module (left).
  2. The RGB input captured by the camera (top right).
  3. The surface point cloud reconstructed using the depth prediction module of MACARONS (right).
  4. The volume occupancy field computed and updated in real-time using the volume occupancy module (bottom right). In the video, we removed the points with an occupancy lower or equal to 0.5 for clarity.

MACARONS - Mapping And Coverage Anticipation with RGB ONline Self-supervision

1. Architecture: Reconstruct and Anticipate


MACARONS simultaneously reconstructs the scene and selects the next best camera pose by running three neural modules:

  • The depth module predicts the depth map for the current frame from the last capture frames, which is added to a point cloud that represents the scene.
  • This point cloud is used by the volume occupancy module to predict a volume occupancy field...
  • ...Which is in turn used by the surface coverage gain module to compute the surface coverage gain of a given camera pose.

2. Online training: Collect data and Replay from Memory


During online exploration, we perform a training iteration at each time step t which consists in three steps.

  1. First, during the Decision Making step, we select the next best camera pose to explore the scene by running our three modules as previously described.
  2. Second, the Data Collection & Memory Building step, during which the camera moves toward the camera pose previously predicted, creates a self-supervision signal for all three modules and stores these signals into the Memory.
  3. Third and last, the Memory Replay step selects randomly supervision data stored into the Memory and updates the weights of each of the three modules.

Exploration of large 3D scenes (dataset from SCONE, NeurIPS22, Guédon et al.)

Statue of Liberty (3D mesh by Brian Trepanier)  

results/trajectories/liberty.png Trajectory in GT scene
results/reconstructions/liberty.png Reconstructed surface

Pisa Cathedral (3D mesh by Brian Trepanier)  

results/trajectories/pisa.png Trajectory in GT scene
results/reconstructions/pisa.png Reconstructed surface

Alhambra Palace (3D mesh by Brian Trepanier)  

results/trajectories/alhambra.png Trajectory in GT scene
results/reconstructions/alhambra.png Reconstructed surface

Pantheon (3D mesh by Brian Trepanier)  

results/trajectories/pantheon.png Trajectory in GT scene
results/reconstructions/pantheon.png Reconstructed surface

Eiffel Tower (3D mesh by Brian Trepanier)  

results/trajectories/eiffel.png Trajectory in GT scene
results/reconstructions/eiffel.png Reconstructed surface

Neuschwanstein Castle (3D mesh by Brian Trepanier)  

results/trajectories/neus.png Trajectory in GT scene
results/reconstructions/neus.png Reconstructed surface

Colosseum (3D mesh by Brian Trepanier)  

results/trajectories/colosseum.png Trajectory in GT scene
results/reconstructions/colosseum.png Reconstructed surface

Manhattan Bridge (3D mesh by Brian Trepanier)  

results/trajectories/manhattan.png Trajectory in GT scene
results/reconstructions/manhattan.png Reconstructed surface

Fushimi Castle (3D mesh by Andrea Spognetta)  

results/trajectories/fushimi.png Trajectory in GT scene
results/reconstructions/fushimi.png Reconstructed surface

Christ the Redeemer (3D mesh by Brian Trepanier)  

results/trajectories/redeemer.png Trajectory in GT scene
results/reconstructions/redeemer.png Reconstructed surface

Dunnottar Castle (3D mesh by Andrea Spognetta)  

results/trajectories/dunnottar.png Trajectory in GT scene
results/reconstructions/dunnottar.png Reconstructed surface

Bannerman Castle (3D mesh by Andrea Spognetta)  

results/trajectories/bannerman.png Trajectory in GT scene
results/reconstructions/bannerman.png Reconstructed surface

CVPR 2023 Presentation









If you find this work useful for your research, please cite:
        title={MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision},
        author={Gu{\'e}don, Antoine and Monnier, Tom and Monasse, Pascal and Lepetit, Vincent},
        booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},

Further information

If you like this project, check out our previous work on NBV:


This work was granted access to the HPC resources of IDRIS under the allocation 2022-AD011013387 made by GENCI. We thank Elliot Vincent for inspiring discussions and valuable feedback on the manuscript.

The template webpage is inspired from the (awesome) work UNICORN (ECCV 2022) by Monnier et al.

© You are welcome to copy the code, please attribute the source with a link back to this page and the UNICORN webpage by Tom Monnier and remove the analytics.
Possible misspellings: tom monier, tom monnie, tom monie, monniert.