Our goal in this paper is to discover near duplicate patterns in large collections of artworks. This is harder than standard instance mining due to differences in the artistic media (oil, pastel, drawing, etc), and imperfections inherent in the copying process. The key technical insight is to adapt a standard deep feature to this task by fine-tuning it on the specific art collection using self-supervised learning. More specifically, spatial consistency between neighbouring feature matches is used as supervisory fine-tuning signal. The adapted feature leads to more accurate style-invariant matching, and can be used with a standard discovery approach, based on geometric verification, to identify duplicate patterns in the dataset. The approach is evaluated on several different datasets and shows surprisingly good qualitative discovery results. For quantitative evaluation of the method, we annotated 273 near duplicate details in a dataset of 1587 artworks attributed to Jan Brueghel and his workshop. Beyond artwork, we also demonstrate improvement on localization on the Oxford5K photo dataset as well as on historical photograph localization on the Large Time Lags Location (LTLL) dataset.


Part 1: Feature Learning

Our feature learning strategy contains 3 steps : (a) Sample a patch and find some candidates; (b) Filter false postive candidates with spatial consistency; (c) Metric learning with positive and negative pairs.

Part 2: Discovery

We estimate the affine transformation from correspondences between pairs of images. The matched patterns are obtained with inliers.

Brueghel dataset

About Brueghel

The Brueghel dataset contains 1,587 artworks done in different media (e.g. oil, link, chalk, watercolour) and on different materials (e.g. paper, panel copper), describing a wide variety of scenes (e.g. landscape, rilgious, still life).

Our annotations

We introduce new annotations for the Brueghel dataset: we selected 10 of the most commonly repeated details in the dataset and annotated the bounding box of the duplicated visual patterns. This resulted in 273 annotated instances, with a minimum of 11 and a maximum of 57 annotations per pattern. Our one-shot detection results are illustrated in the following figure :

Download dataset

Click here to download the Brueghel dataset and our annotations (~400M).


More results can be seen in :

Data, Licence, Code, Paper and Slide

To cite our paper,

  title={Discovering Visual Patterns in Art Collections with Spatially-consistent Feature Learning},
  author={Shen, Xi and Efros, Alexei A and Aubry, Mathieu},
  booktitle={Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},


This work was partly supported by ANR project EnHerit ANR-17-CE23-0008, project Rapid Tabasco, NSF IIS-1633310, Berkeley-France funding, and gifts from Adobe to Ecole des Ponts. We thank Shiry Ginosar for her advice and assistance, and Xiaolong Wang, Shell Hu, Minsu Cho, Pascal Monasse and Renaud Marlet for fruitful discussions, and Kenny Oh, Davienne Shields and Elizabeth Alice Honig for thier help on defining the task and building the Brueghel dataset.


A new approach to discover visual patterns in art collections in TechXplore.com

From Brueghel to Warhol: AI enters the attribution fray in Nature.com