Object recognition and computer vision 2023

Reconnaissance d'objets et vision artificielle (RecVis) - Master M2 MVA

Teaching assistants: Ricardo Garcia (), Guillaume Le Moing (), and Charles Raude ()
Lecture time: Tuesday 16:00-19:00
Lecture room: Salle Dussane, ENS Ulm, 45 rue d'Ulm, 75005 Paris
*A few exceptions to the room and time are denoted in the schedule below.*

News

Course information

Course description
Automated object recognition -- and more generally scene analysis -- from photographs and videos is the grand challenge of computer vision. This course presents the image, object, and scene models, as well as the methods and algorithms, used today to address this challenge.

Assignments
There will be three programming assignments representing 50% (10% + 20% + 20%) of the grade. The supporting materials for the programming assignments and final projects will be in Python and make use of Jupyter notebooks. For additional technical instructions on the assignments please follow this link.

Final project
The final project will represent 50% of the grade.

Collaboration policy
You can discuss the assignments and final projects with other students in the class. Discussions are encouraged and are an essential component of the academic environment. However, each student has to work out their assignment alone (including any coding, experiments or derivations) and submit their own report. For the final project, you may work alone or in a group of maximum of 2 people. If working in a group, we expect a more substantial project, and an equal contribution from each student in the group. The final project report needs to explicitly specify the contribution of each student. Both students are expected to present the project at the oral presentation and contribute equally to writing the report. The assignments and final projects will be checked to contain original material. Any uncredited reuse of material (text, code, results) will be considered as plagiarism and will result in zero points for the assignment / final project. If a plagiarism is detected, the student will be reported to MVA.

Computer vision and machine learning talks
You are welcome to attend seminars in the Imagine and Willow research groups. Please see the seminar schedules for Imagine and Willow. Typically, these are one hour research talks given by visiting speakers. Imagine talks are at Ecole des Ponts. Willow talks are at Inria, 2 Rue Simone IFF, 75012 (when you enter the building, tell the receptionist you are going for a seminar).

Feedback
During any point in time, during or after the semester, do not hesitate to fill this form to provide anonymous feedback about the class.


Course schedule (subject to change)

Note: Slides are provided after each lecture.

# Date Lecturer Topic and reading materials Slides
1 Oct 3 Gül Varol,
Jean Ponce
Class logistics: assignments, final projects, grading (G. Varol);
Introduction to visual recognition; Camera geometry; Image processing (J. Ponce)

History: J. Mundy - Object recognition in the geometric era: A retrospective;
Camera geometry: Forsyth & Ponce Ch.1-2. Hartley & Zisserman - Ch.6;
Image procesing: End-to-end interpretable learning of non-blind image deblurring [Eboli, Sun and Ponce, ECCV 2022], Lucas-kanade reloaded: End-to-end super-resolution from raw image bursts [Lecouat, Ponce and Mairal, ICCV 2021]

[logistics] [intro & geometry & img processing]
2 Oct 10 *Salle 1Z18, ENS Paris-Saclay* Gül Varol Instance-level recognition: local invariant features, correspondence, image matching

Scale and affine invariant interest point detectors [Mikolajczyk and Schmid, IJCV 2004], Distinctive image features from scale-invariant keypoints [D. Lowe, IJCV 2004] (SIFT), R. Szeliski, Sections 7.1.1 (feature detectors), 7.1.2 (feature descriptors), 7.1.3 (feature matching), 7.4.2 (Hough transform), 8.1.4 (RANSAC), Video Google: Efficient visual search of videos [Sivic and Zisserman, ICCV 2003] (Bag of features)


Assignment 1 out.
[local features & matching]
3 Oct 17 *Inria, 2 rue Simone IFF, 75012* TAs Python/Pytorch tutorial. Attendance is optional.
4 Oct 24 Armand Joulin Supervised learning and deep learning; Optimization and regularization for neural networks; Introduction to sequence models
Assignment 1 due. Assignment 2 out.
[intro nn] [sequence models]
5 Oct 31 Gül Varol Neural networks for visual recognition: CNNs and image classification; Beyond CNNs: Transformers

Gradient-based learning applied to document recognition [Lecun et al., IEEE 1998] (CNN), ImageNet Classification with Deep Convolutional Neural Networks [Krizhevsky et al., NeurIPS 2012] (AlexNet), Visualizing and Understanding Convolutional Networks [Zeiler and Fergus, ECCV 2014], Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks [Oquab et al., CVPR 2014] (pretraining), Very Deep Convolutional Networks for Large-Scale Visual Recognition [Simonyan and Zisserman, ICLR 2015] (VGGNet), Deep Residual Learning for Image Recognition [He et al., CVPR 2016] (ResNet), Attention is all you need [Vaswani et al., NeurIPS 2017] (Transformers), An image is worth 16x16 words: Transformers for image recognition at scale [Dosovitskiy et al., ICLR 2021] (ViT)

[nn for img classification]
6 Nov 7 *starting 16h30* Gül Varol Beyond classification: Object detection; Segmentation; Human pose estimation [detection & segmentation & pose]
7 Nov 14 *Salle des Actes, 45 rue d'Ulm, 75005* Josef Sivic Large-scale image and video search
Assignment 2 due. Assignment 3 out.
[search]
8 Nov 21 Gül Varol Generative models; Vision & language

-Generation Chapter: Probabilistic Machine Learning: Advanced Topics [Murphy 2023],
-VAEs: Auto-Encoding Variational Bayes [Kingma and Welling, ICLR 2014],
-GANs: Generative adversarial nets [Goodfellow et al., NeurIPS 2014],
-Diffusion: Denoising diffusion probabilistic models [Ho et al., NeurIPS 2020],
-Diffusion tutorial: Understanding Diffusion Models: A Unified Perspective [Luo 2022],
-CLIP: Learning Transferable Visual Models From Natural Language Supervision [Radford et al., ICML 2021],
-Stable Diffusion: High-Resolution Image Synthesis with Latent Diffusion Models [Rombach et al., CVPR 2022],
-BLIP: BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation [Li et al., ICML 2022]

[generative & VL]
9 Nov 28 Ivan Laptev Weakly-supervised learning; Self-supervised learning; Vision for robotics
Assignment 3 due. Final project topics are out.
[weaksup] [selfsup] [robotics]
10 Dec 5 *Amphi Jaures, 29 rue d'Ulm, 75005* Cordelia Schmid Human action recognition in videos
Final project proposal due.
[videos]
11 Dec 12 Mathieu Aubry 3D computer vision [3D]
12 Jan 8 Gül Varol Final project presentations
The presentations will be virtual, following this schedule.
Final project reports due on Jan 15.

Resources