Classification
CLASSIFICATION | ||||
---|---|---|---|---|
Parameters | values | Definition | Tips | |
SVM_Type | 0 - C-SVC | C-SVC, NuSVC and One-className SVM performes binary and multi-className classification on a dataset. C-SVM and NuSVM are similar methods, but accept slightly different sets of parameters and have different mathematical formulations. One-className SVM algorithms, learns a decision function for novelty detection: classifying new data as similar or different to the training set. | ||
1 - Nu-SVC | ||||
2 - ONE-className SVM | ||||
Kernel Type | 0 - Linear | linear: u'*v | Radial Basis Function is a general purpose kernel, used when there is no prior knowledge about the data because 1. The linear kernel is a special case of RBF since the linear kernel with a penalty parameter C has the same performance as the RBF kernel with some parameters (C, gamma) 2. The second reason is the number of hyperparameters which influences the complexity of model selection. The polynomial kernel has more hyperparameters than the RBF kernel. There are some situations where the RBF kernel is not suitable. In particular, when the number of features is very large, one may just use the linear kernel. | |
1 - Polynomial | polynomial: (gamma*u'*v + coef0)^degree | |||
2 - RBF | radial basis function: exp(-gamma*|u-v|^2) This kernel nonlinearly maps samples into a higher dimensional space so it, unlike the linear kernel, can handle the case when the relation between className labels and attributes is nonlinear. | |||
3 - Sigmoid | sigmoid: tanh(gamma*u'*v + coef0) | |||
Gamma | [0.000122,8] | gamma defines how much influence a single training example has. The larger gamma is, the closer other examples must be to be affected. | ||
Degree | Degree of the polynomial kernel function. Ignored by all other kernels. | |||
Coef0 | Independent term in kernel function. It is only significant in ‘polynomial' and ‘sigmoid'. | |||
Cost (C) | [0.031250,8192] | The parameter C, trades off misclassification of training examples against simplicity of the decision surface. A low C makes the decision surface smooth, while a high C aims at classifying all training examples correctly. As C increases, tendency to misclassification decreases on train data( may lead to overfitting). | C is 1 by default and it's a reasonable default choice. If you have a lot of noisy observations you should decrease it: decreasing C corresponds to more regularization. | |
NU | (0,1] | It's a hyperparameter for nu-SVC, one-className SVM and nu-SVR. It is similar to C. nu is upper bound on the fraction of errors and lower bound on the fraction of number of support vectors( number of support vectors determine the run time). Example: if we want error to be less than 1% then nu is 0.01 and the number of supported vectors will be more than 1% of the total records. | Nu approximates value = the fraction of training errors and support vectors. | |
Cachesize | For C-SVC, SVR, NuSVC and NuSVR, the size of the kernel cache has a strong impact on run times for larger problems. | If you have enough RAM available, it is recommended to set cache size to a higher value than the default of 200(MB), such as 500(MB) or 1000(MB). | ||
Termination Criterion | Tolerance for stopping criterion. The stopping tolerance affects the number of iterations used when optimizing the model. | |||
Shrinking | The shrinking are there to save the training time.They sometimes help, and sometimes they do not. It's a matter of runtime, rather than convergence. If the number of iterations is large, then shrinking can shorten the training time. | We found that if the number of iterations is large, then shrinking can shorten the training time. | ||
Probability_Estimates | Whether to enable probability estimates. | |||
nr_weight | nr_weight is the number of elements in the array weight_label and weight. Each weight[i] corresponds to weight_label[i], meaning that the penalty of className weight_label[i] is scaled by a factor of weight[i]. |