...

Thibault Groueix

thibault.groueix.2012 at polytechnique.org


Short bio: Thibault Groueix is a research engineer at Adobe Research. Previously, he was a research scientist at Naver Labs Europe. He obtained his PhD from the Imagine team of Ecole Nationale des Ponts et Chaussees. His thesis, advised by Mathieu Aubry, received the second place PhD awards from SIF and AFIA. His research is focused on 3D deep learning, specifically 3D deformation of surfaces and 3D reconstruction of humans.


Internships for PhD students: I am regularly looking for interns to join us at Adobe Research for a summer internship. If you are interested in interning with me, don't be shy and send me an e-mail with your CV, research interests and a short description of potential topics you would like to work on. I am currently looking for interns for the summer 2024 and plan to work on projects that leverage 2D foundational models for 3D tasks.


Pro-bono mentoring: I am happy to discuss anything related to your research career, especially if you belong to an underrepresented group in STEM. You can schedule a meeting here. From PhD applications, PhD life, to research internships, full-time jobs and so on. I studied in France, so I may have a hard time with questions specific to the US universities. You could also try reaching out to Matheus here about US-specific queries.


News


Research

See Google Scholar profile for a full list of publications.

PSDR-Room: Single Photo to Scene using Differentiable Rendering
K. Yan, F. Luan, M. Hašan, T. Groueix, V. Deschaintre, S. Zhao
SIGGRAPH Asia 2023.
@InProceedings{PSDR-Room_Yan_2023,
 author = {Yan, Kai and Luan, Fujun and Hašan, Miloš and Groueix, Thibault and Deschaintre, Valentin and Zhao, Shuang},
 title = {PSDR-Room: Single Photo to Scene using Differentiable Rendering},
 booktitle = {ACM Transactions on Graphics (SIGGRAPH Asia)},
 year = {2023},
 }

A 3D digital scene contains many components: lights, materials and geometries, interacting to reach the desired appearance. Staging such a scene is time-consuming and requires both artistic and technical skills. In this work, we propose PSDR-Room, a system allowing to optimize lighting as well as the pose and materials of individual objects to match a target image of a room scene, with minimal user input. To this end, we leverage a recent path-space differentiable rendering approach that provides unbiased gradients of the rendering with respect to geometry, lighting, and procedural materials, allowing us to optimize all of these components using gradient descent to visually match the input photo appearance. We use recent single-image scene understanding methods to initialize the optimization and search for appropriate 3D models and materials. We evaluate our method on real photographs of indoor scenes and demonstrate the editability of the resulting scene components.

3DMiner: Discovering Shapes from Large-Scale Unannotated Image Datasets
T. Cheng, M. Gadelha, S. Pirk, T. Groueix, R. Mech, A. Markham, N. Trigoni
ICCV 2023.
@InProceedings{3dminer_Cheng_2023,
 author = {Cheng, Ta-Ying and Gadelha, Matheus and Pirk, Sören and Groueix, Thibault and Mech, Radomir and Markham, Andrew and Trigoni, Niki},
 title = {3DMiner: Discovering Shapes from Large-Scale Unannotated Image Datasets},
 booktitle = {International Conference on Computer Vision (ICCV)},
 year = {2023},
 }

We present 3DMiner - a pipeline for mining 3D shapes from challenging large-scale unannotated image datasets. Unlike other unsupervised 3D reconstruction methods, we assume that, within a large-enough dataset, there must exist images of objects with similar shapes but varying backgrounds, textures, and viewpoints. Our approach leverages the recent advances in learning self-supervised image representations to cluster images with geometrically similar shapes and find common image correspondences between them. We then exploit these correspondences to obtain rough camera estimates as initialization for bundle-adjustment. Finally, for every image cluster, we apply a progressive bundle-adjusting reconstruction method to learn a neural occupancy field representing the underlying shape. We show that this procedure is robust to several types of errors introduced in previous steps (e.g., wrong camera poses, images containing dissimilar shapes, etc.), allowing us to obtain shape and pose annotations for images in-the-wild. When using images from Pix3D chairs, our method is capable of producing significantly better results than state-of-the-art unsupervised 3D reconstruction techniques, both quantitatively and qualitatively. Furthermore, we show how 3DMiner can be applied to in-the-wild data by reconstructing shapes present in images from the LAION-5B dataset.

CNOS: A Strong Baseline for CAD-based Novel Object Segmentation
V. N. Nguyen, T. Hodaň, G. Ponimatkin, T. Groueix, V. Lepetit
ICCV Workshop, BOP challenge 2023.
@InProceedings{cnos_Nguyen_2023,
 author = {Nguyen, Van Nguyen and Hodaň, Tomáš and Ponimatkin, Georgy and Groueix, Thibault and Lepetit, Vincent},
 title = {CNOS: A Strong Baseline for CAD-based Novel Object Segmentation},
 booktitle = {ICCV Workshop, BOP challenge},
 year = {2023},
 }

We propose a simple three-stage approach to segment unseen objects in RGB images using their CAD models. Leveraging recent powerful foundation models, DINOv2 and Segment Anything, we create descriptors and generate proposals, including binary masks for a given input RGB image. By matching proposals with reference descriptors created from CAD models, we achieve precise object ID assignment along with modal masks. We experimentally demonstrate that our method achieves state-of-the-art results in CAD-based novel object segmentation, surpassing existing approaches on the seven core datasets of the BOP challenge by 19.8% AP using the same BOP evaluation protocol. Our source code is available at this https URL.

TextDeformer: Geometry Manipulation using Text Guidance
W. Gao N. Aigerman, T. Groueix, V. G. Kim, R. Hanocka
SIGGRAPH 2023.
@InProceedings{textdeformer_Gao_2023,
 author = {Gao, William and Aigerman, Noam and Groueix, Thibault and Kim, Vladimir G. and Hanocka, Rana},
 title = {TextDeformer: Geometry Manipulation using Text Guidance},
 booktitle = {ACM Transactions on Graphics (SIGGRAPH)},
 year = {2023},
 }

We present a technique for automatically producing a deformation of an input triangle mesh, guided solely by a text prompt. Our framework is capable of deformations that produce both large, low-frequency shape changes, and small high-frequency details. Our framework relies on differentiable rendering to connect geometry to powerful pre-trained image encoders, such as CLIP and DINO. Notably, updating mesh geometry by taking gradient steps through differentiable rendering is notoriously challenging, commonly resulting in deformed meshes with significant artifacts. These difficulties are amplified by noisy and inconsistent gradients from CLIP. To overcome this limitation, we opt to represent our mesh deformation through Jacobians, which updates deformations in a global, smooth manner (rather than locally-sub-optimal steps). Our key observation is that Jacobians are a representation that favors smoother, large deformations, leading to a global relation between vertices and pixels, and avoiding localized noisy gradients. Additionally, to ensure the resulting shape is coherent from all 3D viewpoints, we encourage the deep features computed on the 2D encoding of the rendering to be consistent for a given vertex from all viewpoints. We demonstrate that our method is capable of smoothly-deforming a wide variety of source mesh and target text prompts, achieving both large modifications to, e.g., body proportions of animals, as well as adding fine semantic details, such as shoe laces on an army boot and fine details of a face.

Neural Face Rigging for Animating and Retargeting Facial Meshes in the Wild
D. Qin J. Saito, N. Aigerman, T. Groueix, T. Komura
SIGGRAPH 2023.
@InProceedings{nfr_Qin_2023,
 author = {Qin, Dafei and Saito, Jun and Aigerman, Noam and Groueix, Thibault and Komura, Taku},
 title = {Neural Face Rigging for Animating and Retargeting Facial Meshes in the Wild},
 booktitle = {ACM Transactions on Graphics (SIGGRAPH)},
 year = {2023},
 }

We propose an end-to-end deep-learning approach for automatic rigging and retargeting of 3D models of human faces in the wild. Our approach, called Neural Face Rigging (NFR), holds three key properties: (i) NFR's expression space maintains human-interpretable editing parameters for artistic controls; (ii) NFR is readily applicable to arbitrary facial meshes with different connectivity and expressions; (iii) NFR can encode and produce fine-grained details of complex expressions performed by arbitrary subjects. To the best of our knowledge, NFR is the first approach to provide realistic and controllable deformations of in-the-wild facial meshes, without the manual creation of blendshapes or correspondence. We design a deformation autoencoder and train it through a multi-dataset training scheme, which benefits from the unique advantages of two data sources: a linear 3DMM with interpretable control parameters as in FACS, and 4D captures of real faces with fine-grained details. Through various experiments, we show NFR's ability to automatically produce realistic and accurate facial deformations across a wide range of existing datasets as well as noisy facial scans in-the-wild, while providing artist-controlled, editable parameters.

NOPE: Novel Object Pose Estimation from a Single Image
V. N. Nguyen, T. Groueix, Y. Hu, M. Salzmann, V. Lepetit
Arxiv 2023.
@InProceedings{nope_Nguyen_2023,
 author = {Nguyen, Van Nguyen and Groueix, Thibault and Hu, Yinlin and Salzmann, Mathieu and Lepetit, Vincent},
 title = {NOPE: Novel Object Pose Estimation from a Single Image},
 booktitle = {arXiv preprint},
 year = {2023},
 }

The practicality of 3D object pose estimation remains limited for many applications due to the need for prior knowledge of a 3D model and a training period for new objects. To address this limitation, we propose an approach that takes a single image of a new object as input and predicts the relative pose of this object in new images without prior knowledge of the object's 3D model and without requiring training time for new objects and categories. We achieve this by training a model to directly predict discriminative embeddings for viewpoints surrounding the object. This prediction is done using a simple U-Net architecture with attention and conditioned on the desired pose, which yields extremely fast inference. We compare our approach to state-of-the-art methods and show it outperforms them both in terms of accuracy and robustness.

PoseBERT: A Generic Transformer Module for Temporal 3D Human Modeling
F. Baradel, R. Bregier, T. Groueix, P. Weinzaepfel, Y. Kalantidis, G. Rogez
TPAMI 2022.
@InProceedings{posebertpami_Baradel_2022,
 author = {Baradel, Fabien and Bregier, Romain and Groueix, Thibault and Weinzaepfel, Philippe and Kalantidis, Yannis and Rogez, Gregory},
 title = {PoseBERT: A Generic Transformer Module for Temporal 3D Human Modeling},
 booktitle = {IEEE transactions on pattern analysis and machine intelligence},
 year = {2022},
 }

Training state-of-the-art models for human pose estimation in videos requires datasets with annotations that are really hard and expensive to obtain. Although transformers have been recently utilized for body pose sequence modeling, related methods rely on pseudo-ground truth to augment the currently limited training data available for learning such models. In this paper, we introduce PoseBERT, a transformer module that is fully trained on 3D Motion Capture (MoCap) data via masked modeling. It is simple, generic and versatile, as it can be plugged on top of any image-based model to transform it in a video-based model leveraging temporal information. We showcase variants of PoseBERT with different inputs varying from 3D skeleton keypoints to rotations of a 3D parametric model for either the full body (SMPL) or just the hands (MANO). Since PoseBERT training is task agnostic, the model can be applied to several tasks such as pose refinement, future pose prediction or motion completion without finetuning. Our experimental results validate that adding PoseBERT on top of various state-of-the-art pose estimation methods consistently improves their performances, while its low computational cost allows us to use it in a real-time demo for smoothly animating a robotic hand via a webcam. Test code and models are available at https://github.com/naver/posebert

Recovering Detail in 3D Shapes Using Disparity Maps
M. Ramirez de Chanlatte, M. Gadelha, T. Groueix, R. Mech
ECCV Workshop, Learning to Generate 3D Shapes and Scenes 2022.
@InProceedings{RecoveringDetails_Chanlatte_2022,
 author = {Ramirez de Chanlatte, Marissa and Gadelha, Matheus and Groueix, Thibault and Mech, Radomir},
 title = {Recovering Detail in 3D Shapes Using Disparity Maps},
 booktitle = {ECCV Workshop, Learning to Generate 3D Shapes and Scenes},
 year = {2022},
 }

We present a fine-tuning method to improve the appearance of 3D geometries reconstructed from single images. We leverage advances in monocular depth estimation to obtain disparity maps and present a novel approach to transforming 2D normalized disparity maps into 3D point clouds by using shape priors to solve an optimization on the relevant camera parameters. After creating a 3D point cloud from disparity, we introduce a method to combine the new point cloud with existing information to form a more faithful and detailed final geometry. We demonstrate the efficacy of our approach with multiple experiments on both synthetic and real images.

Learning Joint Surface Atlase
T. Deprelle, T. Groueix, N. Aigerman, V. G. Kim, M. Aubry
ECCV Workshop, Learning to Generate 3D Shapes and Scenes 2022.
@InProceedings{jointatlas_Deprelle_2022,
 author = {Deprelle, Theo and Groueix, Thibault and Aigerman, Noam and Kim, Vladimir G. and Aubry, Mathieu},
 title = {Learning Joint Surface Atlase},
 booktitle = {ECCV Workshop, Learning to Generate 3D Shapes and Scenes},
 year = {2022},
 }

This paper describes new techniques for learning atlas-like representations of 3D surfaces, i.e. homeomorphic transformations from a 2D domain to surfaces. Compared to prior work, we propose two major contributions. First, instead of mapping a fixed 2D domain, such as a set of square patches, to the surface, we learn a continuous 2D domain with arbitrary topology by optimizing a point sampling distribution represented as a mixture of Gaussians. Second, we learn consistent mappings in both directions: charts, from the 3D surface to 2D domain, and parametrizations, their inverse. We demonstrate that this improves the quality of the learned surface represen- tation, as well as its consistency in a collection of related shapes. It thus leads to improvements for applications such as correspondence estimation, texture transfer, and consistent UV mapping. As an additional technical contribution, we outline that, while incorporating normal consistency has clear benefits, it leads to issues in the optimization, and that these issues can be mitigated using a simple repulsive regularization. We demonstrate that our contributions provide better surface representation than existing baselines.

Neural Jacobian Fields: Learning Intrinsic Mappings of Arbitrary Meshes
N. Aigerman, K. Gupta, V. G. Kim, S. Chaudhuri, J. Saito, T. Groueix
SIGGRAPH 2022.
@InProceedings{njf_Aigerman_2022,
 author = {Aigerman, Noam and Gupta, Kunal and Kim, Vladimir G. and Chaudhuri, Siddhartha and Saito, Jun and Groueix, Thibault},
 title = {Neural Jacobian Fields: Learning Intrinsic Mappings of Arbitrary Meshes},
 booktitle = {ACM Transactions on Graphics (SIGGRAPH)},
 year = {2022},
 }

This paper introduces a framework designed to accurately predict piecewise linear mappings of arbitrary meshes via a neural network, enabling training and evaluating over heterogeneous collections of meshes that do not share a triangulation, as well as producing highly detail-preserving maps whose accuracy exceeds current state of the art. The framework is based on reducing the neural aspect to a prediction of a matrix for a single given point, conditioned on a global shape descriptor. The field of matrices is then projected onto the tangent bundle of the given mesh, and used as candidate jacobians for the predicted map. The map is computed by a standard Poisson solve, implemented as a differentiable layer with cached pre-factorization for efficient training. This construction is agnostic to the triangulation of the input, thereby enabling applications on datasets with varying triangulations. At the same time, by operating in the intrinsic gradient domain of each individual mesh, it allows the framework to predict highly-accurate mappings. We validate these properties by conducting experiments over a broad range of scenarios, from semantic ones such as morphing, registration, and deformation transfer, to optimization-based ones, such as emulating elastic deformations and contact correction, as well as being the first work, to our knowledge, to tackle the task of learning to compute UV parameterizations of arbitrary meshes. The results exhibit the high accuracy of the method as well as its versatility, as it is readily applied to the above scenarios without any changes to the framework.

Leveraging MoCap Data for Human Mesh Recovery
F. Baradel, T. Groueix, R. Bregier, P. Weinzaepfel, Y. Kalantidis, G. Rogez
3DV 2021.
@InProceedings{LeveragingMoCap_Baradel_2021,
 author = {Baradel, Fabien and Groueix, Thibault and Bregier, Romain and Weinzaepfel, Philippe and Kalantidis, Yannis and Rogez, Gregory},
 title = {Leveraging MoCap Data for Human Mesh Recovery},
 booktitle = {International Conference on 3D Vision (3DV)},
 year = {2021},
 }

Training state-of-the-art models for human body pose and shape recovery from images or videos requires datasets with corresponding annotations that are really hard and expensive to obtain. Our goal in this paper is to study whether poses from 3D Motion Capture (MoCap) data can be used to improve image-based and video-based human mesh recovery methods. We find that fine-tune image-based models with synthetic renderings from MoCap data can increase their performance, by providing them with a wider variety of poses, textures and backgrounds. In fact, we show that simply fine-tuning the batch normalization layers of the model is enough to achieve large gains. We further study the use of MoCap data for video, and introduce PoseBERT, a transformer module that directly regresses the pose parameters and is trained via masked modeling. It is simple, generic and can be plugged on top of any state-of-the-art image-based model in order to transform it in a video-based model leveraging temporal information. Our experimental results show that the proposed approaches reach state-of-the-art performance on various datasets including 3DPW, MPI-INF-3DHP, MuPoTS-3D, MCB and AIST. Test code and models will be available soon.

Deep Transformation-Invariant Clustering
T. Monnier, T. Groueix, M. Aubry
NeurIPS 2020. Oral.
@InProceedings{jointatlas_Monnier_2020,
 author = {Monnier, Tom and Groueix, Thibault and Aubry, Mathieu},
 title = {Deep Transformation-Invariant Clustering},
 booktitle = {Conference on Neural Information Processing Systems (NeurIPS)},
 year = {2020},
 }

Recent advances in image clustering typically focus on learning better deep representations. In contrast, we present an orthogonal approach that does not rely on abstract features but instead learns to predict image transformations and directly performs clustering in pixel space. This learning process naturally fits in the gradient-based training of K-means and Gaussian mixture model, without requiring any additional loss or hyper-parameters. It leads us to two new deep transformation-invariant clustering frameworks, which jointly learn prototypes and transformations. More specifically, we use deep learning modules that enable us to resolve invariance to spatial, color and morphological transformations. Our approach is conceptually simple and comes with several advantages, including the possibility to easily adapt the desired invariance to the task and a strong interpretability of both cluster centers and assignments to clusters. We demonstrate that our novel approach yields competitive and highly promising results on standard image clustering benchmarks. Finally, we showcase its robustness and the advantages of its improved interpretability by visualizing clustering results over real photograph collections.

Learning elementary structures for 3D shape generation and matching
T. Deprelle, T. Groueix, M. Fisher, V. G. Kim, B. Russell, M. Aubry
NeurIPS 2019.
@InProceedings{elementarystructures_Deprelle_2019,
 author = {Deprelle, Theo and Groueix, Thibault and Fisher, Matthew and Kim, Vladimir G. and Russell, Bryan and Aubry, Mathieu},
 title = {Learning elementary structures for 3D shape generation and matching},
 booktitle = {Conference on Neural Information Processing Systems (NeurIPS)},
 year = {2019},
 }

We propose to represent shapes as the deformation and combination of learnable elementary 3D structures, which are primitives resulting from training over a collection of shape. We demonstrate that the learned elementary 3D structures lead to clear improvements in 3D shape generation and matching. More precisely, we present two complementary approaches for learning elementary structures: (i) patch deformation learning and (ii) point translation learning. Both approaches can be extended to abstract structures of higher dimensions for improved results. We evaluate our method on two tasks: reconstructing ShapeNet objects and estimating dense correspondences between human scans (FAUST inter challenge). We show 16% improvement over surface deformation approaches for shape reconstruction and outperform FAUST inter challenge state of the art by 6%.

Unsupervised cycle-consistent deformation for shape matching
T. Groueix, M. Fisher, V. G. Kim, B. Russell, M. Aubry
SGP 2019.
@InProceedings{cycleconsistentdeformation_Groueix_2019,
 author = {Groueix, Thibault and Fisher, Matthew and Kim, Vladimir G. and Russell, Bryan and Aubry, Mathieu},
 title = {Unsupervised cycle-consistent deformation for shape matching},
 booktitle = {Eurographics Symposium on Geometry Processing (SGP)},
 year = {2019},
 }

We propose a self-supervised approach to deep surface deformation. Given a pair of shapes, our algorithm directly predicts a parametric transformation from one shape to the other respecting correspondences. Our insight is to use cycle-consistency to define a notion of good correspondences in groups of objects and use it as a supervisory signal to train our network. Our method does not rely on a template, assume near isometric deformations or rely on point-correspondence supervision. We demonstrate the efficacy of our approach by using it to transfer segmentation across shapes. We show, on Shapenet, that our approach is competitive with comparable state-of-the-art methods when annotated training data is readily available, but outperforms them by a large margin in the few-shot segmentation scenario.

3D-CODED : 3D Correspondences by Deep Deformation
T. Groueix, M. Fisher, V. G. Kim, B. Russell, M. Aubry
ECCV 2018.
@InProceedings{3D-CODED_Groueix_2018,
 author = {Groueix, Thibault and Fisher, Matthew and Kim, Vladimir G. and Russell, Bryan and Aubry, Mathieu},
 title = {3D-CODED : 3D Correspondences by Deep Deformation},
 booktitle = {European Conference on Computer Vision (ECCV)},
 year = {2018},
 }

We present a new deep learning approach for matching deformable shapes by introducing Shape Deformation Networks which jointly encode 3D shapes and correspondences. This is achieved by factoring the surface representation into (i) a template, that parameterizes the surface, and (ii) a learnt global feature vector that parameterizes the transformation of the template into the input surface. By predicting this feature for a new shape, we implicitly predict correspondences between this shape and the template. We show that these correspondences can be improved by an additional step which improves the shape feature by minimizing the Chamfer distance between the input and transformed template. We demonstrate that our simple approach improves on state-of-the-art results on the difficult FAUST-inter challenge, with an average correspondence error of 2.88cm. We show, on the TOSCA dataset, that our method is robust to many types of perturbations, and generalizes to non-human shapes. This robustness allows it to perform well on real unclean, meshes from the the SCAPE dataset.

AtlasNet: A Papier-Mâché Approach to Learning 3D Surface Generation
T. Groueix, M. Fisher, V. G. Kim, B. Russell, M. Aubry
CVPR 2017. Spotlight, Best Poster Award at PAISS.
@InProceedings{atlasnet_Groueix_2017,
 author = {Groueix, Thibault and Fisher, Matthew and Kim, Vladimir G. and Russell, Bryan and Aubry, Mathieu},
 title = {AtlasNet: A Papier-Mâché Approach to Learning 3D Surface Generation},
 booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
 year = {2017},
 }

We introduce a method for learning to generate the surface of 3D shapes. Our approach represents a 3D shape as a collection of parametric surface elements and, in contrast to methods generating voxel grids or point clouds, naturally infers a surface representation of the shape. Beyond its novelty, our new shape generation framework, AtlasNet, comes with significant advantages, such as improved precision and generalization capabilities, and the possibility to generate a shape of arbitrary resolution without memory issues. We demonstrate these benefits and compare to strong baselines on the ShapeNet benchmark for two applications: (i) auto-encoding shapes, and (ii) single-view reconstruction from a still image. We also provide results showing its potential for other applications, such as morphing, parametrization, super-resolution, matching, and co-segmentation.

Interactive Monte-Carlo Ray-Tracing Upsampling
M. Boughida, T. Groueix, T. Boubekeur
Eurographics 2016. Poster.
@InProceedings{bilateralupsampling_Boughida_2016,
 author = {Boughida, Malik and Groueix, Thibault and Boubekeur, Tamy},
 title = {Interactive Monte-Carlo Ray-Tracing Upsampling},
 booktitle = {Eurographics},
 year = {2016},
 }

We introduce a method for learning to generate the surface of 3D shapes. Our approach represents a 3D shape as a collection of parametric surface elements and, in contrast to methods generating voxel grids or point clouds, naturally infers a surface representation of the shape. Beyond its novelty, our new shape generation framework, AtlasNet, comes with significant advantages, such as improved precision and generalization capabilities, and the possibility to generate a shape of arbitrary resolution without memory issues. We demonstrate these benefits and compare to strong baselines on the ShapeNet benchmark for two applications: (i) auto-encoding shapes, and (ii) single-view reconstruction from a still image. We also provide results showing its potential for other applications, such as morphing, parametrization, super-resolution, matching, and co-segmentation.


PhD Thesis

Learning 3D Generation and Matching
Thibault Groueix
Ecole Nationale des Ponts et Chaussees (ENPC), 2020.
AFIA award finalist, Gilles-Khan award finalist
@PHDTHESIS{groueix2018thesis,
                  title=Learning 3D Generation and Matching,
                  author={Groueix, Thibault},
                  school={Ecole Nationale des Ponts et Chaussees},
                  year={2020}
                  }

The goal of this thesis is to develop deep learning approaches to model and analyse 3D shapes. Progress in this field could democratize artistic creation of 3D assets which currently requires time and expert skills with technical software. We focus on the design of deep learning solutions for two particular tasks, key to many 3D modeling applications: single-view reconstruction and shape matching.

A single-view reconstruction (SVR) method takes as input a single image and predicts a 3D model of the physical world which produced that image. SVR dates back to the early days of computer vision. In particular, in the 1960s, Lawrence G. Roberts proposed to align simple 3D primitives to an input image making the assumption that the physical world is made of simple geometric shapes like cuboids. Another approach proposed by Berthold Horn in the 1970s is to decompose the input image in intrinsic images and use those to predict the depth of every input pixel. Since several configurations of shapes, texture and illumination can explain the same image, both approaches need to make assumptions on the distribution of textures and 3D shapes to resolve the ambiguity. In this thesis, we learn these assumptions from large-scale datasets instead of manually designing them. Learning SVR also allows to reconstruct complete 3D models, including parts which are not visible in the input image.
Shape matching aims at finding correspondences between 3D objects. Solving this task requires both a local and global understanding of 3D shapes which is hard to achieve. We propose to train neural networks on large-scale datasets to solve this task and capture knowledge implicitly through their internal parameters. Shape matching supports many 3D modeling applications such as attribute transfer, automatic rigging for animation, or mesh editing.

The first technical contribution of this thesis is a new parametric representation of 3D surfaces which we model using neural networks. The choice of data representation is a critical aspect of any 3D reconstruction algorithm. Until recently, most of the approaches in deep 3D model generation were predicting volumetric voxel grids or point clouds, which are discrete representations. Instead, we present an alternative approach that predicts a parametric surface deformation i.e. a mapping from a template to a target geometry. To demonstrate the benefits of such a representation, we train a deep encoder-decoder for single-view reconstruction using our new representation. Our approach, dubbed AtlasNet, is the first deep single-view reconstruction approach able to reconstruct meshes from images without relying on an independent postprocessing. And it can perform such a reconstruction at arbitrary resolution without memory issues. A more detailed analysis of AtlasNet reveals it also generalizes better to categories it has not been trained on than other deep 3D generation approaches.
Our second main contribution is a novel shape matching approach based purely on reconstruction via deformations. We show that the quality of the shape reconstructions is critical to obtain good correspondences, and therefore introduce a test-time optimization scheme to refine the learned deformations. For humans and other deformable shape categories deviating by a near-isometry, our approach can leverage a shape template and isometric regularization of the surface deformations. As category exhibiting non-isometric variations, such as chairs, do not have a clear template, we also learn how to deform any shape into any other and leverage cycleconsistency constraints to learn meaningful correspondences. Our matching-by-reconstruction strategy operates directly on point clouds, is robust to many types of perturbations, and outperformed the state of the art by 15% on dense matching of real human scans.


Teaching

  • Fall   2018 Traitement de l'information et vision artificielle (TIVA), TA - ENPC Master 1, École Nationale des Ponts et Chaussees
  • Fall   2018 Apprentissage statistique (MALAP), TA - ENPC Master 1, École Nationale des Ponts et Chaussees
  • Fall   2017 Traitement de l'information et vision artificielle (TIVA), TA - ENPC Master 1, École Nationale des Ponts et Chaussees
  • Fall   2017 Apprentissage statistique (MALAP), TA - ENPC Master 1, École Nationale des Ponts et Chaussees


Talks

<strong>  Tutorial: </strong> </font>Deep Learning for 3D surface reconstruction
Deep 3D deformations (slides)
T. Groueix
This talk covers my PhD work.
Invited at:
<strong>  Tutorial: </strong> Deep Learning for 3D surface reconstruction
Tutorial: Deep Learning for 3D surface reconstruction (slides)
T. Groueix*, P-A Langlois*
Invited at:
  • ,

  • Code and demo

    • NeuralJacobianFields
    • AtlasNet
    • Atlasnet v2
    • 3D-CODED
    • DTI clustering
    • CycleConsistentDeformation
    • Netvision
    • Phd resources
    • ChamferDistancePytorch