Tom Monnier

Research Scientist at Meta

I am a Research Scientist at Meta working on computer vision and 3D modeling. I did my PhD in the amazing Imagine lab at ENPC under the guidance of Mathieu Aubry. During my PhD, I was fortunate to work with Jean Ponce (Inria), Matthew Fisher (Adobe Research), Alyosha Efros and Angjoo Kanazawa (UC Berkeley). Before that, I completed my engineer's degree (=M.Sc.) at Mines Paris.

My research mainly focuses on learning things from images without annotations, through self-supervised and unsupervised methods (see representative papers). I am always looking for PhD interns, feel free to reach out!

News

Show more

Hide

10 / 2023 Our multi-view approach to 3D decomposition called Differentiable Blocks World has been accepted at NeurIPS 2023!
09 / 2023 I am joining Meta in the Paris office as a research scientist
07 / 2023 We release Differentiable Blocks World on arXiv, where 3D scenes are reconstructed by fitting textured primitives in multiple views
03 / 2023 Our work MACARONS on simultaneous path planning and 3D reconstruction has been accepted to CVPR 2023!
02 / 2023 The Learnable Typewriter - a document-analysis system learned with no or weak supervision - is published on arXiv
12 / 2022 Our analytical paper on unsupervised visual reasoning has been accepted at NeurIPS 2022 SSL Workshop!
07 / 2022 The single-view reconstruction method UNICORN 🦄 has been accepted to ECCV 2022!
06 / 2022 Visiting University of California and presenting my PhD work at Berkeley AI Research
10 / 2021 Our work on representing 3D shapes has been accepted at 3DV 2021!
07 / 2021 Our sprite-based image decomposition approach has been accepted to ICCV 2021!
05 / 2021 Starting a remote internship in the Creative Intelligence Lab at Adobe Research advised by Matthew Fisher
12 / 2020 Presenting DTI Clustering at a workshop for ParisMLGroup
11 / 2020 Presenting docExtractor at a DHNord 2020 workshop
10 / 2020 Our work on DTI Clustering is accepted at NeurIPS 2020 as an oral presentation!
06 / 2020 Our system docExtractor is accepted at ICFHR 2020 as an oral!
02 / 2020 Starting PhD in the Imagine research group at ENPC under the supervision of Mathieu Aubry
01 / 2020 Our web application for watermark recognition is online, check out the paper and test the app here!

Publications

Differentiable Blocks World: Qualitative 3D Decomposition by Rendering Primitives
Tom Monnier, Jake Austin, Angjoo Kanazawa, Alexei A. Efros, Mathieu Aubry
NeurIPS 2023
paper | webpage | code | slides | bibtex

We compute a primitive-based 3D reconstruction from multiple views by optimizing textured superquadric meshes with learnable transparency.

We introduce MACARONS, a method that learns in a self-supervised fashion to explore new environments and reconstruct them in 3D using RGB images only.

The Learnable Typewriter: A Generative Approach to Text Line Analysis
Ioannis Siglidis, Nicolas Gonthier, Julien Gaubil, Tom Monnier, Mathieu Aubry
arXiv 2023
paper | webpage | code | bibtex

We build upon sprite-based image decomposition approaches to design a generative method for character analysis and recognition in text lines.

Towards Unsupervised Visual Reasoning: Do Off-The-Shelf Features Know How to Reason?
Monika Wysoczanska, Tom Monnier, Tomasz Trzcinski, David Picard
NeurIPS Workshops 2022
paper | bibtex

A Transformer-based framework to evaluate off-the-shelf features (object-centric and dense representations) for the reasoning task of VQA.

We present UNICORN, a self-supervised approach leveraging the consistency across different single-view images for high-quality 3D reconstructions.

Representing Shape Collections with Alignment-Aware Linear Models
Romain Loiseau, Tom Monnier, Mathieu Aubry, Loïc Landrieu
3DV 2021
paper | webpage | code | bibtex

We characterize 3D shapes as affine transformations of linear families learned without supervision, and showcase its advantages on large shape collections.

We discover the objects recurrent in unlabeled image collections by modeling images as a composition of learnable sprites.

A simple adaptation of K-means to make it work on pixels! We align prototypes to each sample image before computing cluster distances.

Leveraging synthetic training data to efficiently extract visual elements from historical document images.

Academic activities

Invited talks

06 / 2023 Job talk, Meta (virtual), about the PhD thesis

06 / 2023 Job talk, Google Research (virtual), about the PhD thesis

07 / 2022 Seminar, BAIR - University of California (Berkeley, USA), about the PhD thesis

04 / 2022 Seminar, Imagine (La Turballe, France), about "Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency"

03 / 2021 Seminar, DHAI - Ecole Normale Supérieure (virtual), about "docExtractor: An off-the-shelf historical document element extraction"

02 / 2021 Seminar, LIGM - ESIEE (Paris, France), about "Deep Transformation-Invariant Clustering"

12 / 2020 Workshop, Paris Machine Learning Group (virtual), about "Deep Transformation-Invariant Clustering"

12 / 2020 Conference oral, NeurIPS 2020 (virtual), about "Deep Transformation-Invariant Clustering"

11 / 2020 Workshop, DHNord 2020 (virtual), about "docExtractor: An off-the-shelf historical document element extraction"

09 / 2020 Conference oral, ICFHR 2020 (virtual), about "docExtractor: An off-the-shelf historical document element extraction"