Introduction to SVR: Support Vector Regression

History

In 1964 the primary SVM algorithm was formulated by Vapnik and Chervonenkis . Boser recommended a way to develop non-linear classifiers to maximize the margin of the hyperplanes by applying the kernel trick. Cortes and Vapnik proposed the current incarnation (soft margin) in 1993 and published the current incarnation (soft margin) in 1995. Smola and Scholkopf elucidates the mathematical formulation of the Support Vector Regression (SVR) model.

What is SVR?

The Support Vector Regression (SVR) uses the same principles as the SVM for classification, with only a few differences as like support Vector Machine can also be used as a regression method, maintaining all the main features that characterize the algorithm (maximal margin). As because output is a real number it becomes very difficult to predict the information at hand, which has infinite possibilities. A margin of tolerance (epsilon) is set in approximation to the SVM which would have already requested from the problem. But also, there is a more complicated reason, the algorithm is more complicated therefore to be taken in consideration. However, the main idea is always the same as to minimize the error, individualizing the hyperplane which maximizes the margin, keeping in mind that part of the error is tolerated. 

Optimisation of SVR 

The regression problem is a generalization of the classification problem, in which the model returns a continuous-valued output, as opposed to an output from a finite set. In other words, a regression model estimates a continuous-valued multivariate function. SVMs solve binary classification problems by formulating them as convex optimization problems. The optimization problem entails finding the maximum margin separating the hyperplane, while correctly classifying as many training points as possible.

SVMs represent this optimal hyperplane with support vectors. The sparse solution and good generalization of the SVM lend themselves to adaptation to regression problems. SVM generalization to SVR is accomplished by introducing an e-insensitive region around the function, called the e-tube. This tube reformulates the optimization problem to find the tube that best approximates the continuous-valued function, while balancing model complexity and prediction error.

SVR is formulated as an optimization problem by first defining a convex e-insensitive loss function to be minimized and finding the flattest tube that contains most of the training instances. So, a multi objective function is constructed from the loss function and the geometrical properties of the tube. Then, the convex optimization, which has a unique solution, is solved, using appropriate numerical optimization algorithms. The hyperplane is represented in terms of support vectors, which are training samples that lie outside the boundary of the tube. As in SVM, the support vectors in SVR are the most influential instances that affect the shape of the tube, and the training and test data are assumed to be independent and identically distributed, drawn from the same fixed but unknown probability distribution function in a supervised-learning context.