Tombone's Computer Vision Blog: svm

Showing posts with label svm. Show all posts

Tuesday, December 06, 2011

Graphics meets Big Data meets Machine Learning

We've all played Where's Waldo as children, and at least for me it was quite a fun game. So today let's play an image-based Big Data version of Where's Waldo. I will give you a picture, and you have to find it in a large collection of images! This is a form of image retrieval, and this particular formulation is also commonly called "image matching."

The only catch is that you are only given one picture, and I am free to replace the picture with a painting or a sketch. Any two-dimensional pattern is a valid query image, but the key thing to note is that there is only a single input image. Life would be awesome if Google's Picasa had this feature built in!

The classical way of solving this problem is via a brute-force nearest neighbor algorithm, an algorithm which won't match pixel pattern directly, but an algorithm which will also use a state-of-the-art image descriptor such as GIST for comparison. Back in 2007, at SIGGRAPH, James Hays and Alexei Efros have shown this to work quite well once you have a very large database of images! But the reason why the database had to be so large is because a naive Nearest Neighbor algorithm is actually quite dumb. The descriptor might be cleverer than matching raw pixel intensities, but for a machine, an image is nothing but a matrix of numbers, and nobody told the machine which patterns in the matrix are meaningful and which ones aren't. In short, the brute-force algorithm works if there are similar enough images such that all parts of the input image will match a retrieved image. But ideally we would like the algorithm to get better matches by automatically figuring out which parts of the query image are meaningful (e.g., the fountain in the painting) and which parts aren't (e.g., the reflections in the water).

A modern approach to solve this issue is to collect a large set of related "positive images" and a large set of un-related "negative images" and then train a powerful classifier which can hopefully figure out the meaningful bits of the image. But in this approach the problem is twofold. First, working with a single input image it is not clear whether standard machine learning tools will have a chance of learning anything meaningful. The second issue, a significantly worse problem, is that without a category label or tag, how are we supposed to create a negative set?!? Exemplar-SVMs to the rescue! We can use a large collection of images from the target domain (the domain we want to find matches from) as the negative set -- as long as the "negative set" contains only a small fraction of potentially related images, learning a linear SVM with a single positive still works.

Here is an excerpt from a Techcrunch article which summarizes the project concisely:

"Instead of comparing a given image head to head with other images and trying to determine a degree of similarity, they turned the problem around. They compared the target image with a great number of random images and recorded the ways in which it differed the most from them. If another image differs in similar ways, chances are it’s similar to the first image. " -- Techcrunch

Abhinav Shrivastava, Tomasz Malisiewicz, Abhinav Gupta, Alexei A. Efros. Data-driven Visual Similarity for Cross-domain Image Matching. In SIGGRAPH ASIA, December 2011. Project Page

Here is a short listing of some articles which mention our research (thank Abhinav!).

http://techcrunch.com/2011/12/06/cmu-researchers-one-up-google-image-search-and-photosynth-with-visual-similarity-engine/

http://news.cs.cmu.edu/article.php?a=2858

http://futureoftech.msnbc.msn.com/_news/2011/12/06/9252228-computer-mimics-human-ability-to-match-images

http://www.physorg.com/news/2011-12-team-computerized-method-images-photos.html

http://nanopatentsandinnovations.blogspot.com/2011/12/carnegie-mellon-creates-computerized.html

http://www.sciencecodex.com/read/carnegie_mellon_creates_computerized_method_for_matching_images_in_photos_paintings_sketches-82595

http://www.cra.org/ccc/rh-imatch.php

Wednesday, August 24, 2011

The vision hacker culture at Google ...

I sometimes get frustrated when developing machine learning algorithms in C++. And since working in object recognition basically means you have to be a machine learning expert, trying something new and exciting in C++ can be extremely painful. I don't miss the C++ heavy workflow for vision projects at Google. C++ is great for building large-scale systems, but not for pioneering object recognition representations. I like to play with pixels and I like to think of everything as matrices. But programming languages, software engineering philosophies, and other coding issues aren't going to be today's topic. Today I want to talk about the one thing that is more valuable that is computers, and that is people. Not just people, but a community of people, and in particular the culture at Google -- in particular, vision@Google.

I miss being around the hacker culture at Google.

The people at Google aren't just hackers, they are Jedis when it comes to building great stuff -- and that is why I recommend a Google internship to many of my fellow CMU vision Robograds (fyi, Robograds are CMU Robotics Graduate Students). CMU-ers, like Googlers, like to build stuff. However, CMU-ers are typically younger.

Image from http://www.rabittooth.com/

What is a software engineering Jedi, you might ask? Tis' one who is not afraid of million cores, one who is not afraid of building something great. While little boys get hurt by the guns 'n knives of C++, Jedi use their tools like ninjas use their swords. You go into Google as a boy, you come out a man. NOTE: I do not recommend going to Google and just toying around in Matlab for 3 months. Build something great, find a Yoda-esque mentor, or at least strive to be a Jedi. There's plenty of time in grad school for Matlab and writing papers. If you get a chance to go to Google, take the opportunity to go large-scale and learn to MapReduce like the pros.

Every day I learn about more and more people I respect in vision and learning going to Google, or at least interning there (e.g., Andrej Karpathy who is starting his PhD@Stanford and Santosh Divvala who is a well-known CMU PhD student and vision hacker). And I really can't blame them for choosing Google over places like Microsoft for the summer. I can't think of many better places to be -- the culture is inimitable. I spent two summers at Jay Yagnik's group some of the great people I interned with are already full-time Googlers (e.g. Luca Bertelli and Mehmet Emre Sargin). And what is really great about vision@google is that these guys get to publish surprisingly often! Not just throw-away-code kind of publish, but stuff that fits inside large-scale systems -- stuff which is already inside Google products. The technology is often inside the Google product before the paper goes public! Of course it's not easy to publish at a place like Google because there is just way too much exciting large-scale stuff going on. Here is a short list of some cool 2010/2011 vision papers (from vision conferences) with significant Googler contributions.

Kernelized Structural SVM Learning

“Kernelized Structural SVM Learning for Supervised Object Segmentation”, Luca Bertelli, Tianli Yu, Diem Vu, Burak Gokturk, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2011.
[abstract] [pdf]

Finding Meaning on YouTube

“Finding Meaning on YouTube: Tag Recommendation and Category Discovery”, George Toderici,Hrishikesh Aradhye, Marius Pasca, Luciano Sbaiz, Jay Yagnik, Computer Vision and Pattern Recognition, 2010.
[abstract] [pdf]

Here is a very exciting and new paper from SIGGRAPH 2011. It is a sort of Visual Memex for faces -- congratulations on this paper, guys! Check out the video below.

Exploring Photobios Movie

Exploring Photobios from Ira Kemelmacher on Vimeo

Ira Kemelmacher-Shlizerman, Eli Shechtman, Rahul Garg, Steven M. Seitz. "Exploring Photobios." ACM Transactions on Graphics (SIGGRAPH), Aug 2011. [pdf]

Finally, here is a very mathematical paper with a sexy title from the vision@google team. It will be presented at the upcoming ICCV 2011 Conference in Barcelona -- the same conference where I'll be presenting my Exemplar-SVM paper.

The Power of Comparative Reasoning
Jay Yagnik, Dennis Strelow, David Ross, Ruei-Sung Lin. ICCV 2011. [PDF]

P.S. If you're a fellow vision blogger, then come find me in Barcelona@iccv2011 -- we'll go brag a beer.

Friday, June 19, 2009

A Shift of Focus: Relying on Prototypes versus Support Vectors

The goal of today's blog post is to outline an important difference between traditional categorization models in Psychology such as Prototype Models, and Support Vector Machine (SVM) based models.

When solving a SVM optimization problem in the dual (given a kernel function), the answer is represented as a set of weights associated with each of the data-centered kernels. In the Figure above, a SVM is used to learn a decision boundary between the blue class (desks) and the red class (chairs). The sparsity of such solutions means that only a small set of examples are used to define the class decision boundary. All points on the wrong side of the decision boundary and barely yet correctly classified points (within the margin) have non-zero weights. Many Machine Learning researchers get excited about the sparsity of such solutions because in theory, we only need to remember a small number of kernels for test time. However, the decision boundary is defined with respect to the problematic examples (misclassified and barely classified ones) and not the most typical examples. The most typical (and easy to recognize) examples are not even necessary to define the SVM decision boundary. Two data sets that have the same problematic examples, but significant differences in the "well-classified" examples might result in the same exact SVM decision boundary.

My problem with such boundary-based approaches is that by focusing only on the boundary between classes useful information is lost. Consider what happens when two points are correctly classified (and fall well beyond the margin on their correct side): the distance-to-decision-boundary is not a good measure of class membership. By failing to capture the "density" of data, the sparsity of such models can actually be a bad thing. As with discriminative methods, reasoning about the support vectors is useful for close-call classification decisions, but we lose fine-scale membership details (aka "density information") far from the decision surface.

In a single-prototype model (pictured above), a single prototype is used per class and distances-to-prototypes implicitly define the decision surface. The focus is on exactly the 'most confident' examples, which are the prototypes. Prototypes are created during training -- if we fit a Gaussian distribution to each class, the mean becomes the prototype. Notice that by focusing on Prototypes, we gain density information near the prototype at the cost of losing fine-details near the decision boundary. Single-Prototype models generally perform worse on forced-choice classification tasks when compared to their SVM-based discriminative counterparts; however, there are important regimes where too much emphasis on the decision boundary is a bad thing.

In other words, Prototype Methods are best and what they were designed to do in categorization, namely capture Typicality Effects (see Rosch). It would be interesting to come up with more applications where handing Typicality Effects and grading membership becomes more important than making close-call classification decision. I suspect that in many real-world information retrieval applications (where high precision is required and low recall tolerated) going beyond boundary-based techniques is the right thing to do.

Tuesday, November 04, 2008

Linear Support Vector Machine (SVM) in the Primal

I often solve linear SVMs. These are convex optimization problems, much like logistic regression, where the goal is to find a linear decision boundary between two classes. Unlike logistic regression, only the points near the decision boundary affect its parameters. Data points that are correctly classified by a margin do not influence the decision boundary.

Quite often when SVMs are taught in a Machine Learning course, the dual and kernelization are jumped into very quickly. While the formulation looks nice and elegant, I'm not sure how many students can come home after such a lecture and implement an SVM in a language such as MATLAB on their own.

I have to admit that I never really obtained a good grasp of Support Vector Machines until I sat through through John Lafferty's lectures in 10-702: Statistical Machine Learning which demistyfied them. The main idea was that an SVM is just like logistic regression but with a different loss function -- the hinge loss function.

A very good article written on SVMs, and how they can be efficiently tackled in the primal (which is super-easy to understand) is Olivier Chapelle's Training a Support Vector Machine in the Primal. Chapelle is a big proponent of primal optimization and he has some arguments on why primal approximations are better than dual approximations. On this webpage one can also find MATLAB code for SVMs which is very fast in practice since it uses a second order optimization technique. In order to use such a second order technique, the squared hinge loss is used (see the article for why). I have used this code many times in the past even for linear binary classification problems with over 6 million data points embedded in 7 dimensions.

In fact, this is the code that I have used in the past for distance function learning. So the next time you want to throw something simple and fast at a binary classification problem, check out Chapelle's code.