Showing posts with label graphics. Show all posts
Showing posts with label graphics. Show all posts

Tuesday, December 06, 2011

Graphics meets Big Data meets Machine Learning

We've all played Where's Waldo as children, and at least for me it was quite a fun game.  So today let's play an image-based Big Data version of Where's Waldo.  I will give you a picture, and you have to find it in a large collection of images!  This is a form of image retrieval, and this particular formulation is also commonly called "image matching."


The only catch is that you are only given one picture, and I am free to replace the picture with a painting or a sketch.  Any two-dimensional pattern is a valid query image, but the key thing to note is that there is only a single input image. Life would be awesome if Google's Picasa had this feature built in!


The classical way of solving this problem is via a brute-force nearest neighbor algorithm, an algorithm which won't match pixel pattern directly, but an algorithm which will also use a state-of-the-art image descriptor such as GIST for comparison.  Back in 2007, at SIGGRAPH, James Hays and Alexei Efros have shown this to work quite well once you have a very large database of images!  But the reason why the database had to be so large is because a naive Nearest Neighbor algorithm is actually quite dumb.  The descriptor might be cleverer than matching raw pixel intensities, but for a machine, an image is nothing but a matrix of numbers, and nobody told the machine which patterns in the matrix are meaningful and which ones aren't.  In short, the brute-force algorithm works if there are similar enough images such that all parts of the input image will match a retrieved image.  But ideally we would like the algorithm to get better matches by automatically figuring out which parts of the query image are meaningful  (e.g., the fountain in the painting) and which parts aren't (e.g., the reflections in the water).

A modern approach to solve this issue is to collect a large set of related "positive images" and a large set of un-related "negative images" and then train a powerful classifier which can hopefully figure out the meaningful bits of the image. But in this approach the problem is twofold.  First, working with a single input image it is not clear whether standard machine learning tools will have a chance of learning anything meaningful.  The second issue, a significantly worse problem, is that without a category label or tag, how are we supposed to create a negative set?!?  Exemplar-SVMs to the rescue!  We can use a large collection of images from the target domain (the domain we want to find matches from) as the negative set -- as long as the "negative set" contains only a small fraction of potentially related images, learning a linear SVM with a single positive still works.




Here is an excerpt from a Techcrunch article which summarizes the project concisely:

"Instead of comparing a given image head to head with other images and trying to determine a degree of similarity, they turned the problem around. They compared the target image with a great number of random images and recorded the ways in which it differed the most from them. If another image differs in similar ways, chances are it’s similar to the first image. " -- Techcrunch


Abhinav ShrivastavaTomasz MalisiewiczAbhinav GuptaAlexei A. EfrosData-driven Visual Similarity for Cross-domain Image Matching. In SIGGRAPH ASIA, December 2011. Project Page



Here is a short listing of some articles which mention our research (thank Abhinav!).




Wednesday, November 16, 2011

don't throw away old code: github-it!

My thesis experiments on Exemplar-SVMs (my PhD thesis link: Note, 33MB) would have taken approximately 20 CPU years to finish.  But not on a fat CMU cluster!  Here is some simple code which helped make things possible in ~1month of 200+ cores of crunching.  That scale of computation is not quite Google-scale computing, but it was a unforgettable experience as a CMU PhD student.  I've recently had to go back to the SSH / GNU Screen method of starting scripts at MIT, since we do not have torque/pbs there, but I definitely use these scripts.  Fork it, use it, change it, hack it, improve it, break it, learn from it, etc.

https://github.com/quantombone/warp_scripts

I used these scripts to drive the experiments in my Exemplar-SVM framework (also on Github).


The basic take home message is "do not throw away old code" which you found useful at some time.  C'mon ex-phd students, I know you wrote a lot of code, you graduated and now you feel embarrassed to share your code.  Who cares if you never had a chance to clean it up, if the world never gets to see it then it will die a silent death from lack of use.  Just put it on Github, and let others take a look.  Git is the world's best source control/versioning system. Its distributed nature makes it perfect for large-scale collaboration.  Now with github sharing is super easy! Sharing is caring.  Let's make the world a better place for hackerdom, one repository at a time.  I've met some great hackers at MIT, such as the great cvondrick, who is still teaching me how to branch like a champ.

Mathematicians share proofs.  Hackers share code.  Embrace technology, embrace Github.  If you ever want to hack with me, it is probably as important for you to know the basics of git as it is for you to be a master of linear algebra.

Additional Reading:
Distributed Version Control: The Future of History, an article about Git by some Kitware software engineers