Thesis subject

Photorealist Synthetic Images For Deep Learning
Thibault GROUEIX
Advisors: Prof. Mathieu Aubry - Prof. Renaud Marlet
IMAGINE team - Ecole des Ponts ParisTech


Motivation :
The roots of this research project are Deep Learning on the one hand and Computer Graphics on the other hand. Thanks to recent advances in deep learning, Convolutional Neural Networks are state-of-the-art in a range of applications such as computer vision, speech recognition, NLP etc. They discover intricate patterns in the data using the backpropagation algorithm. The later indicates how a network should adjust its internal parameters to make better predictions. This scheme applied to huge datasets is at the core of deep learning recent successes. The quality of the prediction of a model is tighly linked with the quality of the dataset. A "good" dataset provides a large number of instances for a given problem, correct annotations, and a has large intra-class variability.
Though, for specific tasks, it is impossible or too costly to generate new datasets for training. There can be several reasons for this: manual annotation can be too hard for instance or large-scale annotation too costly. This is where computer graphics could provide an incredible benefit : using rendering, we can generate high-quality synthesis images with a reasonable cost. The key idea of this project is to analyze and leverage the power of high-quality rendered images, which we plan to generate with NVIDIA OPTIX, that we already used in a past project [1], or Mitsuba Renderer.
There are three main advantages to generate the dataset through rendering. The first is that the annotation is automatic (everything about a given scene is known at render time). It provides both cost reduction, and more accurate annotation. Second is the possibility to generate large-scale datasets more easily through automated rendering routines. A simple script can modify view points, textures of objects, modify the light sources, and the scene itself. Lastly and maybe most importantly, it allows for annotating tasks that human can not annotate. For instance, if we consider the task "Where does the light comes from in a picture?", manual annotation is very hard for humans. At the very best, we can only give an approximation of the ground-truth label. But in the rendering proccess, the directions of light can be remembered and the correct label can be easily computed.

Previous work :
A number of other tasks can only be correctly labeled through rendering, such as optical flow estimation. In december 2015, researchers from Freibourg University used a synthetic dataset of flying chairs to try to predict optical flow [2]. Their results indicates that one can learn on synthetic data and successfully transfer the learned model on real images. However, many questions on the influence of the quality of the rendering remain unanswered, and will be tackled in this project.

Project plan :
The objective of this project is to bridge computer graphics and deep learning, by providing experimental and theoretical analysis of the mechanisms of transfer between virtual and real datasets. We will focus on the following questions :
- To what extent can we learn CNNs on synthetic images and apply the learned network on real data ? For instance, generating a very high quality synthetic image can take up to a several hours: is there a benefit to such a realistic rendering?
- Can we use rendering to precisely define and evaluate new tasks, such as object insertion in pictures or global illumination light editing?
- How can we rethink neural networks architectures, error function and optimization strategies to tackle new tasks enabled by rendering?

Around this project:
This project will be carried by Thibault Groueix, PhD canditate under the supervision of Prof. Mathieu Aubry and Prof. Renaud Marlet. Thibault GROUEIX is a former Ecole Polytechnique student who did a research internship in rendering at the graphic group of Prof. Tamy Boubekeur that led to a poster in eurographics on the one hand, and a master thesis at EPFL - Biomedical Imaging Group under the supervision of Prof. Adrien Depeursinge and Prof. Michael Unser on the other hand. The Imagine team of the Laboratoire d'Informatique Gaspart-Monge has an acknowledge expertise in deep learning and computer vision. The Inria GraphDeco team is a reference in computer graphics.

[1] M. Boughida, T. Groueix, T. Boubekeur (2016). Interactive Monte-Carlo Ray-Tracing Upsampling . Published as a poster in Eurographics 2016
[2] Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., ... & Brox, T. (2015). Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766).
[3] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770-778)