...

Thibault Groueix

thibault.groueix.2012 at polytechnique.org


Short bio: Thibault Groueix is a research engineer at Adobe Research. Previously, he was a research scientist at Naver Labs Europe. He obtained his PhD from the Imagine team of Ecole Nationale des Ponts et Chaussees. His thesis, advised by Mathieu Aubry, received the second place PhD awards from SIF and AFIA. His research is focused on 3D deep learning, specifically 3D deformation of surfaces and 3D reconstruction of humans.


Internships for PhD students: I am regularly looking for interns to join us at Adobe Research for a summer internship. If you are interested in interning with me, don't be shy and send me an e-mail with your CV, research interests and a short description of potential topics you would like to work on. I am currently looking for interns for the summer 2023.


Pro-bono mentoring: I am happy to discuss anything related to your research career, especially if you belong to an underrepresented group in STEM. You can schedule a meeting here. From PhD applications, PhD life, to research internships, full-time jobs and so on. I studied in France, so I may have a hard time with questions specific to the US universities. You could also try reaching out to Matheus here about US-specific queries.


News

  • 06/2022 I am super excited to go to CVPR!
  • 05/2022 I am invited to the SIF annual meeting on June 14th [video][slides][press article].
  • 04/2022 Neural Jacobian Field is accepted at SIGGRAPH [tweet].
  • 04/2022 I am relocating to San Francisco.
  • 01/2022 I am honored to be a finalist of the Gille-Khan PhD award.

Research

See Google Scholar profile for a full list of publications.

PoseBERT: A Generic Transformer Module for Temporal 3D Human Modeling
F. Baradel , R. Bregier , T. Groueix , P. Weinzaepfel, Y. Kalantidis , G. Rogez
TPAMI 2022.
@inproceedings{baradel2021posebert,
                          title={PoseBERT: A Generic Transformer Module for Temporal 3D Human Modeling},
                          author={Baradel, Fabien and Br{\'e}gier, Romain and Groueix, Thibault and Weinzaepfel, Philippe and Kalantidis, Yannis and Rogez, Gr{\'e}gory},
                          journal={IEEE transactions on pattern analysis and machine intelligence},
                          year={2022}
                          }

Training state-of-the-art models for human pose estimation in videos requires datasets with annotations that are really hard and expensive to obtain. Although transformers have been recently utilized for body pose sequence modeling, related methods rely on pseudo-ground truth to augment the currently limited training data available for learning such models. In this paper, we introduce PoseBERT, a transformer module that is fully trained on 3D Motion Capture (MoCap) data via masked modeling. It is simple, generic and versatile, as it can be plugged on top of any image-based model to transform it in a video-based model leveraging temporal information. We showcase variants of PoseBERT with different inputs varying from 3D skeleton keypoints to rotations of a 3D parametric model for either the full body (SMPL) or just the hands (MANO). Since PoseBERT training is task agnostic, the model can be applied to several tasks such as pose refinement, future pose prediction or motion completion without finetuning. Our experimental results validate that adding PoseBERT on top of various state-of-the-art pose estimation methods consistently improves their performances, while its low computational cost allows us to use it in a real-time demo for smoothly animating a robotic hand via a webcam. Test code and models are available at https://github.com/naver/posebert

Recovering Detail in 3D Shapes Using Disparity Maps
M. Ramirez de Chanlatte , M. Gadelha T. Groueix R. Mech
ECCV Workshop, Learning to Generate 3D Shapes and Scenes 2022.
@inproceedings{ramirezdechanlatte2022,
              title={Recovering Detail in 3D Shapes Using Disparity Maps},
              author={de Chanlatte, Marissa Ramirez and Gadelha, Matheus and Groueix, Thibault and Mech, Radomir},
              journal={Learning to Generate 3D Shapes and Scenes ECCV 2022 Workshop},
              year={2022}
              }

We present a fine-tuning method to improve the appearance of 3D geometries reconstructed from single images. We leverage advances in monocular depth estimation to obtain disparity maps and present a novel approach to transforming 2D normalized disparity maps into 3D point clouds by using shape priors to solve an optimization on the relevant camera parameters. After creating a 3D point cloud from disparity, we introduce a method to combine the new point cloud with existing information to form a more faithful and detailed final geometry. We demonstrate the efficacy of our approach with multiple experiments on both synthetic and real images.

Learning Joint Surface Atlases
T. Deprelle , T. Groueix N. Aigerman , V. G. Kim, M. Aubry
ECCV Workshop, Learning to Generate 3D Shapes and Scenes 2022.
@inproceedings{deprelle2022,
              title={Learning Joint Surface Atlases},
              author={Deprelle, Theo and Groueix, Thibault and Aigerman, Noam and  Kim, Vladimir G and Aubry, Mathieu},
              journal=ArxIV,
              year={2022}
              }

This paper describes new techniques for learning atlas-like representations of 3D surfaces, i.e. homeomorphic transformations from a 2D domain to surfaces. Compared to prior work, we propose two major contributions. First, instead of mapping a fixed 2D domain, such as a set of square patches, to the surface, we learn a continuous 2D domain with arbitrary topology by optimizing a point sampling distribution represented as a mixture of Gaussians. Second, we learn consistent mappings in both directions: charts, from the 3D surface to 2D domain, and parametrizations, their inverse. We demonstrate that this improves the quality of the learned surface represen- tation, as well as its consistency in a collection of related shapes. It thus leads to improvements for applications such as correspondence estimation, texture transfer, and consistent UV mapping. As an additional technical contribution, we outline that, while incorporating normal consistency has clear benefits, it leads to issues in the optimization, and that these issues can be mitigated using a simple repulsive regularization. We demonstrate that our contributions provide better surface representation than existing baselines.

Neural Jacobian Fields: Learning Intrinsic Mappings of Arbitrary Meshes
N. Aigerman , K. Gupta, V. G. Kim, S. Chaudhuri, J. Saito, T. Groueix*
SIGGRAPH 2022.
@inproceedings{aigerman2022neural,
              title={Neural Jacobian Fields: Learning Intrinsic Mappings of Arbitrary Meshes},
              author={Aigerman, Noam and Gupta, Kunal and Kim, Vladimir G and Chaudhuri, Siddhartha and Saito, Jun and Groueix, Thibault},
              journal={SIGGRAPH},
              year={2022}
              }

This paper introduces a framework designed to accurately predict piecewise linear mappings of arbitrary meshes via a neural network, enabling training and evaluating over heterogeneous collections of meshes that do not share a triangulation, as well as producing highly detail-preserving maps whose accuracy exceeds current state of the art. The framework is based on reducing the neural aspect to a prediction of a matrix for a single given point, conditioned on a global shape descriptor. The field of matrices is then projected onto the tangent bundle of the given mesh, and used as candidate jacobians for the predicted map. The map is computed by a standard Poisson solve, implemented as a differentiable layer with cached pre-factorization for efficient training. This construction is agnostic to the triangulation of the input, thereby enabling applications on datasets with varying triangulations. At the same time, by operating in the intrinsic gradient domain of each individual mesh, it allows the framework to predict highly-accurate mappings. We validate these properties by conducting experiments over a broad range of scenarios, from semantic ones such as morphing, registration, and deformation transfer, to optimization-based ones, such as emulating elastic deformations and contact correction, as well as being the first work, to our knowledge, to tackle the task of learning to compute UV parameterizations of arbitrary meshes. The results exhibit the high accuracy of the method as well as its versatility, as it is readily applied to the above scenarios without any changes to the framework.

Leveraging MoCap Data for Human Mesh Recovery
F. Baradel* , T. Groueix* , P. Weinzaepfel, R. Bregier , Y. Kalantidis , G. Rogez
3DV 2021.
@inproceedings{baradel2021posebert,
                  title={Leveraging MoCap Data for Human Mesh Recovery},
                  author={Baradel, Fabien and Groueix, Thibault and Weinzaepfel, Philippe and Brégier, Romain and Kalantidis, Yannis and Rogez, Grégory
                  },
                  booktitle={3DV},
                  year={2021}
                  }

Training state-of-the-art models for human body pose and shape recovery from images or videos requires datasets with corresponding annotations that are really hard and expensive to obtain. Our goal in this paper is to study whether poses from 3D Motion Capture (MoCap) data can be used to improve image-based and video-based human mesh recovery methods. We find that fine-tune image-based models with synthetic renderings from MoCap data can increase their performance, by providing them with a wider variety of poses, textures and backgrounds. In fact, we show that simply fine-tuning the batch normalization layers of the model is enough to achieve large gains. We further study the use of MoCap data for video, and introduce PoseBERT, a transformer module that directly regresses the pose parameters and is trained via masked modeling. It is simple, generic and can be plugged on top of any state-of-the-art image-based model in order to transform it in a video-based model leveraging temporal information. Our experimental results show that the proposed approaches reach state-of-the-art performance on various datasets including 3DPW, MPI-INF-3DHP, MuPoTS-3D, MCB and AIST. Test code and models will be available soon.

Deep Transformation-Invariant Clustering
T. Monnier , T. Groueix , M. Aubry
Neurips 2020 (Oral).
@inproceedings{monnier2020dticlustering,
                  title={Deep Transformation-Invariant Clustering},
                  author={Monnier, Tom and Groueix, Thibault and Aubry, Mathieu},
                  booktitle={Neurips},
                  year={2020}
                  }

Recent advances in image clustering typically focus on learning better deep representations. In contrast, we present an orthogonal approach that does not rely on abstract features but instead learns to predict image transformations and directly performs clustering in pixel space. This learning process naturally fits in the gradient-based training of K-means and Gaussian mixture model, without requiring any additional loss or hyper-parameters. It leads us to two new deep transformation-invariant clustering frameworks, which jointly learn prototypes and transformations. More specifically, we use deep learning modules that enable us to resolve invariance to spatial, color and morphological transformations. Our approach is conceptually simple and comes with several advantages, including the possibility to easily adapt the desired invariance to the task and a strong interpretability of both cluster centers and assignments to clusters. We demonstrate that our novel approach yields competitive and highly promising results on standard image clustering benchmarks. Finally, we showcase its robustness and the advantages of its improved interpretability by visualizing clustering results over real photograph collections.

Learning elementary structures for 3D shape generation and matching
T. Deprelle , T. Groueix, M. Fisher, V. G. Kim , B. C. Russell, M. Aubry
Neurips 2019.
@inproceedings{deprelle2019learning,
              title={Learning elementary structures for 3D shape generation and matching},
              author={Deprelle, Theo and Groueix, Thibault and Fisher, Matthew and Kim, Vladimir G and Russell, Bryan C and Aubry, Mathieu},
              booktitle={Neurips},
              year={2019}
              }

We propose to represent shapes as the deformation and combination of learnable elementary 3D structures, which are primitives resulting from training over a collection of shape. We demonstrate that the learned elementary 3D structures lead to clear improvements in 3D shape generation and matching. More precisely, we present two complementary approaches for learning elementary structures: (i) patch deformation learning and (ii) point translation learning. Both approaches can be extended to abstract structures of higher dimensions for improved results. We evaluate our method on two tasks: reconstructing ShapeNet objects and estimating dense correspondences between human scans (FAUST inter challenge). We show 16% improvement over surface deformation approaches for shape reconstruction and outperform FAUST inter challenge state of the art by 6%.

Unsupervised cycle-consistent deformation for shape matching
T. Groueix, , M. Fisher, V. G. Kim , B. C. Russell, M. Aubry
SGP 2019.
@inproceedings{groueix19cycleconsistentdeformation,
              title = {Unsupervised cycle-consistent deformation for shape matching},
              author = {Groueix, Thibault and Fisher, Matthew and Kim, Vladimir G. and Russell, Bryan and Aubry, Mathieu},
              booktitle = {Symposium on Geometry Processing (SGP)},
              year = {2019}
              }

We propose a self-supervised approach to deep surface deformation. Given a pair of shapes, our algorithm directly predicts a parametric transformation from one shape to the other respecting correspondences. Our insight is to use cycle-consistency to define a notion of good correspondences in groups of objects and use it as a supervisory signal to train our network. Our method does not rely on a template, assume near isometric deformations or rely on point-correspondence supervision. We demonstrate the efficacy of our approach by using it to transfer segmentation across shapes. We show, on Shapenet, that our approach is competitive with comparable state-of-the-art methods when annotated training data is readily available, but outperforms them by a large margin in the few-shot segmentation scenario.

3D-CODED : 3D Correspondences by Deep Deformation
T. Groueix, , M. Fisher, V. G. Kim , B. C. Russell, M. Aubry
ECCV 2018.
@inproceedings{groueix2018b,
              title = {3D-CODED : 3D Correspondences by Deep Deformation},
              author={Groueix, Thibault and Fisher, Matthew and Kim, Vladimir G. and Russell, Bryan and Aubry, Mathieu},
              booktitle = {ECCV},
              year = {2018}
              }

We present a new deep learning approach for matching deformable shapes by introducing Shape Deformation Networks which jointly encode 3D shapes and correspondences. This is achieved by factoring the surface representation into (i) a template, that parameterizes the surface, and (ii) a learnt global feature vector that parameterizes the transformation of the template into the input surface. By predicting this feature for a new shape, we implicitly predict correspondences between this shape and the template. We show that these correspondences can be improved by an additional step which improves the shape feature by minimizing the Chamfer distance between the input and transformed template. We demonstrate that our simple approach improves on state-of-the-art results on the difficult FAUST-inter challenge, with an average correspondence error of 2.88cm. We show, on the TOSCA dataset, that our method is robust to many types of perturbations, and generalizes to non-human shapes. This robustness allows it to perform well on real unclean, meshes from the the SCAPE dataset.

AtlasNet: A Papier-Mâché Approach to Learning 3D Surface Generation
T. Groueix, , M. Fisher, V. G. Kim , B. C. Russell, M. Aubry
CVPR 2018 (Spotlight, Best Poster Award at PAISS).
@inproceedings{groueix2018,
              title={{AtlasNet: A Papier-M\^ach\'e Approach to Learning 3D Surface Generation}},
              author={Groueix, Thibault and Fisher, Matthew and Kim, Vladimir G. and Russell, Bryan and Aubry, Mathieu},
              booktitle={Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
              year={2018}
              }

We introduce a method for learning to generate the surface of 3D shapes. Our approach represents a 3D shape as a collection of parametric surface elements and, in contrast to methods generating voxel grids or point clouds, naturally infers a surface representation of the shape. Beyond its novelty, our new shape generation framework, AtlasNet, comes with significant advantages, such as improved precision and generalization capabilities, and the possibility to generate a shape of arbitrary resolution without memory issues. We demonstrate these benefits and compare to strong baselines on the ShapeNet benchmark for two applications: (i) auto-encoding shapes, and (ii) single-view reconstruction from a still image. We also provide results showing its potential for other applications, such as morphing, parametrization, super-resolution, matching, and co-segmentation.

Interactive Monte-Carlo Ray-Tracing Upsampling
M. Boughida, T. Groueix , T. Boubekeur
Eurographics Poster 2016.
@inproceedings {egp.20161048,
              booktitle = {EG 2016 - Posters},
              editor = {Luis Gonzaga Magalhaes and Rafal Mantiuk},
              title = {{Interactive Monte-Carlo Ray-Tracing Upsampling}},
              author = {Boughida, Malik and Groueix, Thibault and Boubekeur, Tamy},
              year = {2016},
              publisher = {The Eurographics Association},
              ISSN = {1017-4656},
              DOI = {10.2312/egp.20161048}
              }

We introduce a method for learning to generate the surface of 3D shapes. Our approach represents a 3D shape as a collection of parametric surface elements and, in contrast to methods generating voxel grids or point clouds, naturally infers a surface representation of the shape. Beyond its novelty, our new shape generation framework, AtlasNet, comes with significant advantages, such as improved precision and generalization capabilities, and the possibility to generate a shape of arbitrary resolution without memory issues. We demonstrate these benefits and compare to strong baselines on the ShapeNet benchmark for two applications: (i) auto-encoding shapes, and (ii) single-view reconstruction from a still image. We also provide results showing its potential for other applications, such as morphing, parametrization, super-resolution, matching, and co-segmentation.


PhD Thesis

Learning 3D Generation and Matching
Thibault Groueix
Ecole Nationale des Ponts et Chaussees (ENPC), 2020.
AFIA award finalist, Gilles-Khan award finalist
@PHDTHESIS{groueix2018thesis,
                  title={{Learning 3D Generation and Matching}},
                  author={Groueix, Thibault},
                  school={Ecole Nationale des Ponts et Chaussees},
                  year={2020}
                  }

The goal of this thesis is to develop deep learning approaches to model and analyse 3D shapes. Progress in this field could democratize artistic creation of 3D assets which currently requires time and expert skills with technical software. We focus on the design of deep learning solutions for two particular tasks, key to many 3D modeling applications: single-view reconstruction and shape matching.

A single-view reconstruction (SVR) method takes as input a single image and predicts a 3D model of the physical world which produced that image. SVR dates back to the early days of computer vision. In particular, in the 1960s, Lawrence G. Roberts proposed to align simple 3D primitives to an input image making the assumption that the physical world is made of simple geometric shapes like cuboids. Another approach proposed by Berthold Horn in the 1970s is to decompose the input image in intrinsic images and use those to predict the depth of every input pixel. Since several configurations of shapes, texture and illumination can explain the same image, both approaches need to make assumptions on the distribution of textures and 3D shapes to resolve the ambiguity. In this thesis, we learn these assumptions from large-scale datasets instead of manually designing them. Learning SVR also allows to reconstruct complete 3D models, including parts which are not visible in the input image.
Shape matching aims at finding correspondences between 3D objects. Solving this task requires both a local and global understanding of 3D shapes which is hard to achieve. We propose to train neural networks on large-scale datasets to solve this task and capture knowledge implicitly through their internal parameters. Shape matching supports many 3D modeling applications such as attribute transfer, automatic rigging for animation, or mesh editing.

The first technical contribution of this thesis is a new parametric representation of 3D surfaces which we model using neural networks. The choice of data representation is a critical aspect of any 3D reconstruction algorithm. Until recently, most of the approaches in deep 3D model generation were predicting volumetric voxel grids or point clouds, which are discrete representations. Instead, we present an alternative approach that predicts a parametric surface deformation i.e. a mapping from a template to a target geometry. To demonstrate the benefits of such a representation, we train a deep encoder-decoder for single-view reconstruction using our new representation. Our approach, dubbed AtlasNet, is the first deep single-view reconstruction approach able to reconstruct meshes from images without relying on an independent postprocessing. And it can perform such a reconstruction at arbitrary resolution without memory issues. A more detailed analysis of AtlasNet reveals it also generalizes better to categories it has not been trained on than other deep 3D generation approaches.
Our second main contribution is a novel shape matching approach based purely on reconstruction via deformations. We show that the quality of the shape reconstructions is critical to obtain good correspondences, and therefore introduce a test-time optimization scheme to refine the learned deformations. For humans and other deformable shape categories deviating by a near-isometry, our approach can leverage a shape template and isometric regularization of the surface deformations. As category exhibiting non-isometric variations, such as chairs, do not have a clear template, we also learn how to deform any shape into any other and leverage cycleconsistency constraints to learn meaningful correspondences. Our matching-by-reconstruction strategy operates directly on point clouds, is robust to many types of perturbations, and outperformed the state of the art by 15% on dense matching of real human scans.


Teaching

  • Fall   2018 Traitement de l'information et vision artificielle (TIVA), TA - ENPC Master 1, École Nationale des Ponts et Chaussees
  • Fall   2018 Apprentissage statistique (MALAP), TA - ENPC Master 1, École Nationale des Ponts et Chaussees
  • Fall   2017 Traitement de l'information et vision artificielle (TIVA), TA - ENPC Master 1, École Nationale des Ponts et Chaussees
  • Fall   2017 Apprentissage statistique (MALAP), TA - ENPC Master 1, École Nationale des Ponts et Chaussees


Talks

<strong>  Tutorial: </strong> </font>Deep Learning for 3D surface reconstruction
Deep 3D deformations (slides)
T. Groueix
This talk covers my PhD work.
Invited at:
<strong>  Tutorial: </strong> Deep Learning for 3D surface reconstruction
Tutorial: Deep Learning for 3D surface reconstruction (slides)
T. Groueix*, P-A Langlois*
Invited at:
  • ,

  • Code and demo

    • NeuralJacobianFields
    • AtlasNet
    • Atlasnet v2
    • 3D-CODED
    • DTI clustering
    • CycleConsistentDeformation
    • Netvision
    • Phd resources
    • ChamferDistancePytorch