Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency

arXiv 2022

Tom Monnier1Matthew Fisher2Alexei A. Efros3Mathieu Aubry1

1LIGM, Ecole des Ponts, Univ Gustave Eiffel, CNRS
2Adobe Research3UC Berkeley
teaser.jpg

Paper Code Poster BibTeX

Abstract


Approaches to single-view reconstruction typically rely on viewpoint annotations, silhouettes, the absence of background, multiple views of the same instance, a template shape, or symmetry. We avoid all of these supervisions and hypotheses by leveraging explicitly the consistency between images of different object instances. As a result, our method can learn from large collections of unlabelled images depicting the same object category. Our main contributions are two approaches to leverage cross-instance consistency: (i) progressive conditioning, a training strategy to gradually specialize the model from category to instances in a curriculum learning fashion; (ii) swap reconstruction, a loss enforcing consistency between instances having similar shape or texture. Critical to the success of our method are also: our structured autoencoding architecture decomposing an image into explicit shape, texture, pose, and background; an adapted formulation of differential rendering, and; a new optimization scheme alternating between 3D and pose learning. We compare our approach, UNICORN, both on the diverse synthetic ShapeNet dataset - the classical benchmark for methods requiring multiple views as supervision - and on standard real-image benchmarks (Pascal3D+ Car, CUB-200) for which most methods require known templates and silhouette annotations. We also showcase applicability to more challenging real-world collections (CompCars, LSUN), where silhouettes are not available and images are not cropped around the object.

UNICORN 🦄 - UNsupervised cross-Instance COnsistency for 3D ReconstructioN


overview.jpg

Structured autoencoding. Given an input, we predict parameters that are decoded into 4 factors (shape, texture, pose, background) and composed to generate the output. Progressive conditioning is represented with .

Results - 3D reconstruction from a single image


ShapeNet renderings

CompCars

CUB-200

LSUN Motorbike

Results - Segmentation and correspondences


CompCars

seg_car.jpg

CUB-200

seg_bird.jpg

LSUN Motorbike

seg_moto.jpg

Resources


Paper

paper.png

Code

github_repo.png

Slides

slides.png

BibTeX

If you find this work useful for your research, please cite:
@article{monnier2022unicorn,
  title={{Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency}},
  author={Monnier, Tom and Fisher, Matthew and Efros, Alexei A and Aubry, Mathieu},
  journal={arXiv:2204.10310 [cs]},
  year={2022},
}

Further information


If you like this project, check out related works from our group:

Acknowledgements


We thank François Darmon for inspiring discussions; Robin Champenois, Romain Loiseau, Elliot Vincent for feedback on the manuscript; and Michael Niemeyer, Shubham Goel for details on the evaluation. This work was supported in part by ANR project EnHerit ANR-17-CE23-0008, project Rapid Tabasco, gifts from Adobe and HPC resources from GENCI-IDRIS (2021-AD011011697R1).

© You are welcome to copy this website's code for your personal use, please attribute the source with a link back to this page and remove the analytics code in the header. Possible misspellings: tom monier, tom monnie, tom monie, monniert.