We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2
So till now we have seen polynomial kernels.
We have seen polynomial kernels and RBF
kernel. And we showed how RBF kernel is equal to the kneeest neighbor. And hence it's a very nice general purpose kernel. And it's a very general purpose kernel, right? Very good for most tasks, not always, right? And we saw that kernel, that the kernel trick, the kernel trick or kernelization is similar to feature transformation that we saw in logistic regression, right? So now the question here is, and we remember, if you recall, there is a lot of domain specific feature transformation, there is a lot of domain specific feature transformation, right? And feature transformation is the art part, if you recall, feature transformation is the art part where you need to use a lot of domain knowledge to understand and of course some data visualization to understand what are the best feature transforms. Similarly, there are a lot of domain specific kernels, there are a lot of domain specific kernels in machine learning. So what people have done, because as SVMs became more and more popular in 1990s and early 2000s, people have come up with lot of specialized kernels, right? Lot of specialized kernels for specific tasks, right? Because as I told you, the most important part in whole of machine learning is feature transformation, right? So since feature transformation is equivalent to kernelization, people have come up with specialized kernels. For example, if you just Google search for it, if you just Google search for string kernels, right, you'll get a bunch of articles. So these are problems. Suppose if you want to do text classification, there are string kernels for text classification that people have researched in early 2000s. Like this is a paper from 2002, very famous paper because it is cited 1500 times. It's a very, very popular paper where there are specialized kernels, the specialized kernels that are designed for text classification tasks. Similarly, if you type genome kernels, because SVMs are extensively used in genomics, right? When you want to do genome related stuff. So there are a lot of techniques which are kernel based genome prediction of complex traits, right? Genome prediction problems. So the people from bioinformatics, which is basically informatics, which is basically applying computer science techniques to biology problems, right? People from bioinformatics backgrounds have actually taken a lot of the ideas from kernels and they've designed special kernels for gene prediction. Similarly, if your problem is a graph theoretic problem, people have come up with graph kernels because if your problem can be expressed as a graph, there is lot of literature on graph kernels, right? If your data set is like a graph, is like a mathematical graph graph. Here is, if you recall, these are vertices and edges, right? The computer science graph, right? So if your data set can be represented as a computer science graph, there are a lot of specialized graph kernels. So what people have done over the years is they've spent lot of energy and time designing domain specific kernels. So if you're solving a problem, of course RBF is a general purpose. If you don't have a domain specific kernel, it's preferable to use RBF or polynomial kernels, right? But given your problem, it's always given your problem, given the problem that you want to solve, given the problem you want to solve. For example, our Amazon text classification, right? For that problem, of course you can use RBF, but it might be better to use string kernels. It might be better to use string kernels because string kernels are designed for that task. People have designed specialized kernels for the task of text classification, and hence it is better to use string kernels because string kernels especially designed for text classification because what is our Amazon food review classification problem? In a nutshell, it's a text classification problem, right? You're given some text and you decide whether it's positive class or negative. So similarly, if you're solving a problem with some genome data, of course you can use RBF, but it's better to use the special genome kernels, which people in that domain area have spent lot of energy and time designing these kernels, right? So effectively, your feature transformation plus domain knowledge creates custom kernels or domain specific kernels for relevant problems. And given a real world problem, it's always important to research for the right kernels for that problem. Of course, as I told you, you can always use RBF, but it's better if you can find the context specific kernel. So feature transformation, which is the hard part, which is the hard part, which is the hard part of machine learning, is partially replaced. It's not fully replaced, it's partially replaced. It's partially replaced by searching. It's partially replaced by finding the right kernel. It's the appropriate kernel or the right kernel. In feature transformation earlier, what did we see? We had to see whether does log work or x square work or epower X work. So all these are function transformations, right, that we are trying to see which of them works better. In the case of SVM, we are trying to find the right kernel here. We are trying to find the right transformation of X here. We are trying to find the right kernel. And always, when you're solving real world problem, please search for relevant existing kernels in the literature which suit your problem. And please use those kernels if you don't have any. Always. RBF is your backup plan.