Unsupervised Layered Image Decomposition into Object Prototypes

ICCV 2021

Tom Monnier1Elliot Vincent1,2Jean Ponce2Mathieu Aubry1

1LIGM, École des Ponts, Univ Gustave Eiffel, CNRS, Marne-la-Vallée, France
2Inria, École normale supèrieure, CNRS, PSL Research University, Paris, France

Paper Code BibTeX


We present an unsupervised learning framework for decomposing images into layers of automatically discovered object models. Contrary to recent approaches that model image layers with autoencoder networks, we represent them as explicit transformations of a small set of prototypical images. Our model has three main components: (i) a set of object prototypes in the form of learnable images with a transparency channel, which we refer to as sprites; (ii) differentiable parametric functions predicting occlusions and transformation parameters necessary to instantiate the sprites in a given image; (iii) a layered image formation model with occlusion for compositing these instances into complete images including background. By jointly learning the sprites and occlusion/transformation predictors to reconstruct images, our approach not only yields accurate layered image decompositions, but also identifies object categories and instance parameters. We first validate our approach by providing results on par with the state of the art on standard multi-object synthetic benchmarks (Tetrominoes, Multi-dSprites, CLEVR6). We then demonstrate the applicability of our model to real images in tasks that include clustering (SVHN, GTSRB), cosegmentation (Weizmann Horse) and object discovery from unfiltered social network images. To the best of our knowledge, this is the first time that a layered image decomposition algorithm learns an explicit and shared concept of objects, and is robust enough to be applied to real images.



Overview. Given an input image x (highlighted in red), we predict for each layer l the transformations T^{spr}_{\nu_{lk}} and T^{lay}_{\eta_{l}} to apply to the set of sprites {s_1, ..., s_k} that best reconstruct the input. Transformed sprites and transformed background can be composed through C_\delta into many possible reconstructions given a predicted occlusion matrix \delta. We introduce a greedy algorithm to select the best reconstruction, highlighted in green.


Multi-object synthetic benchmarks
(we urge the visitors to click HERE for additional random decomposition results)


Multi-object discovery results. From left to right, we show inputs, reconstructions, semantic segmentation (each color is associated to a different sprite), object instance segmentation (each color is associated to a different object instance), and first decomposition layers color coded with respect to corresponding instance mask color. We urge the visitors to click HERE for additional random decomposition results.


Object-centric image manipulation. Given a query image (top left) from CLEVR6, we show the closest reconstruction (top right) and several image manipulations (next four rows). From top to bottom, we respectively use different sprites, change the objects color, vary their positions and modify the scale.

Weizmann Horse Database


Weizamnn Horse Database cosegmentation. Sprite and mask (left) learned from Weizmann Horse Database and some result examples (right) giving for each input, its reconstruction, the layered composition (background, foreground, mask) and extracted foreground.

Unfiltered Instagram collections


Discovered sprites. We show the 8 best qualitative sprites among 40 discovered from Instagram collections of more than 15k photos linked to #santaphoto and #weddingkiss hashtags.


Decomposition results. We show decomposition results for #santaphoto input samples represented by one of the top 8 sprites shown. Note how the foregound layers differ from the original sprites.







If you find this work useful for your research, please cite:
title={{Unsupervised Layered Image Decomposition into Object Prototypes}},
author={Monnier, Tom and Vincent, Elliot and Ponce, Jean and Aubry, Mathieu},

Further information

If you like this project, check out related works on deep transformations from our group:


We thank François Darmon, Hugo Germain and David Picard for valuable feedback. This work was supported in part by: the French government under management of Agence Nationale de la Recherche as part of the project EnHerit ANR-17-CE23-0008 and the "Investissements d'avenir" program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute); project Rapid Tabasco; gifts from Adobe; the Louis Vuitton/ENS Chair in Artificial Intelligence; the Inria/NYU collaboration; and HPC resources from GENCI-IDRIS (Grant 2020-AD011011697).

© You are welcome to copy this website's code for your personal use, please attribute the source with a link back to this page and remove the analytics code in the header.