Understanding deep features with computer-generated imagery
Supplemental Material

Mathieu Aubry and Bryan Russell

1. Embeddings for different categories

We built a visualization tool to visualize interactively the embeddings corresponding to the first 10 PCA coordinates for our different variation factors, networks, object categories and layers. It corresonds to an extension of figure 2 in the paper. We tested the tool with Firefox, Safari, and Internet Explorer.

2. Extensive quantitative results

We present details of our quantitative results here. Notice that the different high-level observations made in the paper are consistent across categories. We also found interesting that the results vary with the category, which may be a quantitative basis for claiming that some categories (e.g. chairs) are more diverse than others (e.g. cars).

3. 2D-3D nearest neighboors

This corresponds to the experiment presented in section 4.3. Given a natural image, we retrieve rendered views of 3D models using dot product over pool5 features as similarity. Note that we do not claim these results to be state of the art. The images for which the orientation was correctly evaluated and used in the AMT experiment can be seen here.

The images for which the orientation error was larger than 20 degrees can be seen here.

4. Additional experiments

We conducted additional experiments that did seem essential to the 8-page version of the paper, but may be of interest:

Synthetic foreground color on background. Similar to the experiments presented in section 4.1 of the paper, we considered a square of one color on a background of a different color. The detailed quantitative results can be seen here.
Real images with pose variation. In an alternative approach to the link made between real and synthetic images in section 5.3 of the paper, we applied the same methodology as in section 5.2 to real images. As stated in the introduction, this limits considerably the number of variations one can explore, and in particular only a few instances are typically present for each category. We present results on the ETH-80 dataset [1] (8 categories with 10 instances each seen under 41 different viewpoint angles) with AlexNet here.