Regression
REGRESSION | ||||
---|---|---|---|---|
Parameters | values | Definition | Tips | |
SVM_Type | 3 - Epsilon-SVR | The Nu parameter in Nu-SVM can be used to control the amount of support vectors in the resulting model. However, in ϵ-SVR you have no control on how many data vectors from the dataset become support vectors, it could be a few, it could be many. Nonetheless, you will have total control of how much error you will allow your model to have, and anything beyond the specified ϵ will be penalized in proportion to C, which is the regularization parameter. | ||
4 - Nu-SVR | ||||
Kernel Type | 0 - Linear | linear: u'*v | Radial Basis Function is a general purpose kernel, used when there is no prior knowledge about the data because 1. The linear kernel is a special case of RBF since the linear kernel with a penalty parameter C has the same performance as the RBF kernel with some parameters (C, gamma) 2. The second reason is the number of hyperparameters which influences the complexity of model selection. The polynomial kernel has more hyperparameters than the RBF kernel. There are some situations where the RBF kernel is not suitable. In particular, when the number of features is very large, one may just use the linear kernel. | |
1 - Polynomial | polynomial: (gamma*u'*v + coef0)^degree | |||
2 - RBF | radial basis function: exp(-gamma*|u-v|^2) This kernel nonlinearly maps samples into a higher dimensional space so it, unlike the linear kernel, can handle the case when the relation between className labels and attributes is nonlinear. | |||
3 - Sigmoid | sigmoid: tanh(gamma*u'*v + coef0) | |||
Gamma | [0.000122,8] | gamma defines how much influence a single training example has. The larger gamma is, the closer other examples must be to be affected. | ||
Degree | Degree of the polynomial kernel function. Ignored by all other kernels. | |||
Coef0 | Independent term in kernel function. It is only significant in ‘polynomial' and ‘sigmoid'. | |||
Cost (C) | [0.031250,8192] | Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty. | C is 1 by default and it's a reasonable default choice. If you have a lot of noisy observations you should decrease it: decreasing C corresponds to more regularization. | |
NU | (0,1] | It's a hyperparameter for nu-SVC, one-className SVM and nu-SVR. It is similar to C. nu is upper bound on the fraction of errors and lower bound on the fraction of number of support vectors( number of support vectors determine the run time). Example: if we want error to be less than 1% then nu is 0.01 and the number of supported vectors will be more than 1% of the total records. | Nu approximates value = the fraction of training errors and support vectors. | |
Epsilon_SVR (P) | Epsilon in the epsilon-SVR model. It specifies the epsilon-tube within which no penalty is associated in the training loss function with points predicted within a distance epsilon from the actual value. | |||
Cachesize | For C-SVC, Epsilon-SVR, NuSVC and NuSVR, the size of the kernel cache has a strong impact on run times for larger problems. | If you have enough RAM available, it is recommended to set cache size to a higher value than the default of 200(MB), such as 500(MB) or 1000(MB). | ||
Termination Criterion | Tolerance for stopping criterion. The stopping tolerance affects the number of iterations used when optimizing the model. | |||
Shrinking | The shrinking are there to save the training time.They sometimes help, and sometimes they do not. It's a matter of runtime, rather than convergence. If the number of iterations is large, then shrinking can shorten the training time. | We found that if the number of iterations is large, then shrinking can shorten the training time. | ||
Probability_Estimates | Whether to enable probability estimates. |