MACARONS

Abstract

We introduce a method that simultaneously learns to explore new large environments and to reconstruct them in 3D from color images in a self-supervised fashion. This is closely related to the Next Best View problem (NBV), where one has to identify where to move the camera next to improve the coverage of an unknown scene. However, most of the current NBV methods rely on depth sensors, need 3D supervision and/or do not scale to large scenes.

In this paper, we propose the first deep-learning-based NBV approach for dense reconstruction of large 3D scenes from RGB images. We call this approach MACARONS, for Mapping And Coverage Anticipation with RGB Online Self-Supervision. Moreover, we provide a dedicated training procedure for online learning for scene mapping and automated exploration based on coverage optimization in any kind of environment, with no explicit 3D supervision. Consequently, our approach is also the first NBV method to learn in real-time to reconstruct and explore arbitrarily large scenes in a self-supervised fashion.

Indeed, MACARONS simultaneously learns to predict a "volume occupancy field" from color images and, from this field, to predict the NBV. We experimentally show that this greatly improves results for NBV exploration of 3D scenes. In particular, we demonstrate this on a recent dataset made of various 3D scenes and show it performs even better than recent methods requiring a depth sensor, which is not a realistic assumption for outdoor scenes captured with a flying drone. It makes our approach suitable for real-life applications on small drones with a simple color camera. More fundamentally, it shows that an autonomous system can learn to explore and reconstruct environments without any 3D information a priori.

Illustration

This video illustrates how MACARONS explores and reconstructs efficiently a subset of three large 3D scenes. In particular, the video shows several key-elements of our approach:

The trajectory of the camera, evolving in real-time with each NBV iteration performed by the surface coverage gain module (left).
The RGB input captured by the camera (top right).
The surface point cloud reconstructed using the depth prediction module of MACARONS (right).
The volume occupancy field computed and updated in real-time using the volume occupancy module (bottom right). In the video, we removed the points with an occupancy lower or equal to 0.5 for clarity.

MACARONS - Mapping And Coverage Anticipation with RGB ONline Self-supervision

1. Architecture: Reconstruct and Anticipate

MACARONS simultaneously reconstructs the scene and selects the next best camera pose by running three neural modules:

The depth module predicts the depth map for the current frame from the last capture frames, which is added to a point cloud that represents the scene.
This point cloud is used by the volume occupancy module to predict a volume occupancy field...
...Which is in turn used by the surface coverage gain module to compute the surface coverage gain of a given camera pose.

2. Online training: Collect data and Replay from Memory

During online exploration, we perform a training iteration at each time step t which consists in three steps.

First, during the Decision Making step, we select the next best camera pose to explore the scene by running our three modules as previously described.
Second, the Data Collection & Memory Building step, during which the camera moves toward the camera pose previously predicted, creates a self-supervision signal for all three modules and stores these signals into the Memory.
Third and last, the Memory Replay step selects randomly supervision data stored into the Memory and updates the weights of each of the three modules.

Reconstructed surface

CVPR 2023 Presentation

Resources

Paper

Code

Presentation

BibTeX

If you find this work useful for your research, please cite:

      @inproceedings{guedon2023macarons,
        title={MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision},
        author={Gu{\'e}don, Antoine and Monnier, Tom and Monasse, Pascal and Lepetit, Vincent},
        booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
        pages={940--951},
        year={2023}
      }

Further information

If you like this project, check out our previous work on NBV:

Guédon et al. - SCONE: Surface Coverage Optimization in Unknown Environments by Volumetric Integration (NeurIPS 2022, Spotlight)

Acknowledgements

This work was granted access to the HPC resources of IDRIS under the allocation 2022-AD011013387 made by GENCI. We thank Elliot Vincent for inspiring discussions and valuable feedback on the manuscript.

The template webpage is inspired from the (awesome) work UNICORN (ECCV 2022) by Monnier et al.

MACARONS: Mapping And Coverage Anticipation with RGB ONline Self-supervision

CVPR 2023

Antoine Guédon Tom Monnier Pascal Monasse Vincent Lepetit

Paper Code Demo Presentation BibTeX

Abstract

Illustration

MACARONS - Mapping And Coverage Anticipation with RGB ONline Self-supervision

1. Architecture: Reconstruct and Anticipate

2. Online training: Collect data and Replay from Memory

Exploration of large 3D scenes (dataset from SCONE, NeurIPS22, Guédon et al.)

Statue of Liberty (3D mesh by Brian Trepanier)

Pisa Cathedral (3D mesh by Brian Trepanier)

Alhambra Palace (3D mesh by Brian Trepanier)

Pantheon (3D mesh by Brian Trepanier)

Eiffel Tower (3D mesh by Brian Trepanier)

Neuschwanstein Castle (3D mesh by Brian Trepanier)

Colosseum (3D mesh by Brian Trepanier)

Manhattan Bridge (3D mesh by Brian Trepanier)

Fushimi Castle (3D mesh by Andrea Spognetta)

Christ the Redeemer (3D mesh by Brian Trepanier)

Dunnottar Castle (3D mesh by Andrea Spognetta)

Bannerman Castle (3D mesh by Andrea Spognetta)