Can computers recognize Gaudí?

The following is a guest post by the PAPIs.io '14 Hack Night winning team, Marian Moldovan (@marianmoldovan) and Enrique Otero (@meteotester) of the Innovation Department at BEEVA. The Hack Night took place from 6pm to 11pm on tutorials day and a prize was kindly offered by Import.io. In this post, Marian goes over the hack they did using tools they had learnt about earlier that day during tutorials, and how they were able to analyze images of buildings in Barcelona in an attempt to automatically detect different architectural styles.

The first day at PAPIs.io was our favorite. We assisted to many tutorials on interesting tools and we participated in the Hack Night: we wanted to dig deeper into some of the tools we'd seen during the day that impressed us most (Ŷhat, GraphLab and Indico.io). Gathering the things that impressed us most, our first idea was a computer vision system oriented towards recognizing buildings' architectural styles. Living in Barcelona, we found it particularly interesting to have a computer recognize the genius of Antoni Gaudi! Here's what we managed to do during the time we spent hacking... 

 

Our approach

As a starting point we got a dataset of 100 images using the Google Image Search API and the query “Barcelona building”. We used the Indico.io Image Features API to extract 2048-dimensional feature vectors from the images of Barcelona buildings that would enable us to compute similarities between them (the technology behind this is Deep Learning, and more specifically convolutional neural networks with imagenet pre-training). Then we used an R implementation of t-SNE algorithm in order to generate representations of images in 3D feature space and rendering the set of images with OpenGL using the rgl package. t-SNE is a machine learning algorithm for dimensionality reduction models which transforms high-dimensional objects into three-dimensional points in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points. Proximity in 3D space would represent similarity between images, so we thought of using this visualization to find out clusters of buildings with similar style. In the animation below we rotate in this space in order to get an idea of where images are located.

 

Image proximity in feature space

If we look at points 77 and 84 in the representation above in 3D feature space, we can see they move together and they have almost same color, clearly distinct from their closest neighbors. It appears that the corresponding images are of the same building:

papis-2.png

Moreover if we observe the group of images 38, 33, 90, 28, 67, 59, 19 (bottom-left quadrant, with the same color), we can verify they correspond to the same building too:

papis-3.png

Gaudi’s “Casa Batlló” style has been clearly identified. Digging a little deeper into this we can see that a big part of images from Gaudi work lay out on the left-hand side of the visualization in feature space. Specifically the top-left corner gathers several pictures corresponding to “Casa Milá” (43, 70, 48) with some from “Casa Batlló (3) and “Sagrada Familia” (55,36,72,35) in the middle. While clearly separated, most pictures towards the left-hand side belong to modern building, mainly skyscrapers, like the Agbar tower, Media-Tic building, W-Barcelona hotel or Diagonal Zero-Zero tower.

 

Finding the Shortest path

Based on the image features generated by the previous algorithms, we were also able to figure out the "shortest path" from one building to another, made of transitions between similar buildings. Here's what we got with two buildings of very different styles: the Agbar tower (modern) and Gaudi’s “Casa Batlló”:

This makes sense, visually speaking! To compute this path we used an implementation provided by GraphLab — see our notebook in the GraphLab gallery (UPDATE: GraphLab changed their name to Dato). You'll find the rest of our code on Github.

 

What next?

With this hack we mainly wanted to play with computer vision and some of the technology presented at PAPIs.io. We plan to develop a mobile app, maybe with augmented reality in order to help tourists identify buildings. We could also help them create custom city tours, focused on an architectural style, or with transitions from one style to another. The 100 images we used here were downloaded from the Google Image Search API, which has restrictions, so we'll have a look at workarounds that allow us to get more images — maybe using Import.io?

 

Wrapping up

We presented our work at the end of the Hack Night and we also gave a lightning talk on the second day of the conference (see slides below). To recharge batteries after the Hack Night we went for a drink and something to eat with people from BigML. Not only they told us interesting stuff but they kindly offered to pay the bill — thanks guys! Many thanks to Import.io too for the prize!

Import.io Magic is the ultimate in data extraction speed. It's the first data extraction tool to require absolutely no training on the part of the user, simply paste in a URL and their algorithms will do the rest. It represents a major step forward in import.io's greater mission of making web data accessible to everyone.

Louis Dorard

Author of Bootstrapping Machine Learning.