Description

Wrapper class for the libsvm library by Chih-Chung Chang and Chih-Jen Lin. The original wrapper, named WLSVM, was developed by Yasser EL-Manzalawy. The current version is complete rewrite of the wrapper, using Reflection in order to avoid compilation errors, in case the libsvm.jar is not in the CLASSPATH.

Important note:
From Weka >= 3.7.2 installation and use of libsvm in Weka has been simplified by the creation of a LibSVM package that can be installed using either the graphical or command line package manager.

Reference (Weka <= 3.6.8)


Package

weka.classifiers.functions

Download

The wrapper class is part of Weka since version 3.5.2. But libsvm, as a third-party-tool needs to be downloaded separately (see libsvm's Reference). It is recommended to upgrade to a post-3.5.3 version (or Subversion) for bug-fixes and extensions (contains now the distributionForInstance method).

CLASSPATH

Add the libsvm.jar from the libsvm distribution to your CLASSPATH to make it available.

Note: Do NOT start Weka then with java -jar weka.jar. The -jar option overwrites the CLASSPATH, not augments it (a very common trap to fall into). Instead use something like this on Linux:
 java -classpath $CLASSPATH:weka.jar:libsvm.jar weka.gui.GUIChooser
or this on Win32 (if you're starting it from commandline):
 java -classpath "%CLASSPATH%;weka.jar;libsvm.jar" weka.gui.GUIChooser

If you're starting Weka from the Start Menu on Windows, you'll have to add the libsvm.jar to your CLASSPATH environment variable. The following steps are for Windows XP (unfortunately, the GUI changes among the different Windows versions):
  • right-click on My Computer and select Properties from the menu
  • choose the Advanced tab and click on Environment variables at the bottom
  • either add or modify a variable called CLASSPATH and add the libsvm.jar with full path to it

Examples

  • One-class SVM
    This Wekalist post explains how to use the one-class SVM to detect outliers.
  • Class weights
    This Wekalist post explains how to use weights for the classes (-W parameter, weights property in GUI).

Troubleshooting

  • libsvm classes not in CLASSPATH!
    • Check whether the libsvm.jar is really in your CLASSPATH. Execute the following command in the SimpleCLI:
      java weka.core.SystemInfo
      The property java.class.path must list the libsvm.jar. If it is listed, check whether the path is correct.
      If you're on Windows and you find %CLASSPATH% there, see next bullet point to fix this.
    • On Windows, if you added the libsvm.jar to your CLASSPATH environment variable, it can still happen that Weka pops up the error message that the libsvm classes are not in your CLASSPATH. This can happen on Windows 2000 and XP and the %CLASSPATH% does not get expanded to its actual value in starting up Weka. You can inspect your current CLASSPATH with which Weka got started up with the SimpleCLI (see previous bullet point). If %CLASSPATH% is listed there, your system has the same problem. This Wekalist post explains how to explicitly add the mysql.jar to RunWeka.ini (works the same for libsvm.jar).
      Note: backslashes have to be escaped, not only once, but twice (they get interpreted by Java twice!). In other words, instead of one you have to use four: C:\some\where then turns into C:\\\\some\\\\where.

Issues with libsvm.jar

This section is based on this Wekalist post.

The following changes were not incorporated in Weka, since it also means modifying the libsvm Java code, which (I think) is autogenerated from the C code. The authors of libsvm might have to consider that update. It's left to the reader to incorporate these changes.

libsvm.svm uses Math.random

libsvm.svm calls Math.random so the model it returns is usually different for the same training set and svm parameters over time.

Obviously, if you call libsvm.svm from weka.classifiers.functions.libsvm, and you call it again from libsvm.svm_train, the results are also different.

You can use libsvm.svm_save_model to record the svms into files, and then compare the model file from weka libsvm with the model file from libsvm.svm_predict. Then you can see that ProbA values use to be different.

Weka experimenter is based on using always the same random sequences in order to repeat experiments with the same results. So, I'm afraid some important design changes are required on libsvm.jar and weka.classifiers.functions.libsvm.class to keep such behaviour. We made a quick fix adding an static Random attribute to libsvm.svm class:
 static java.util.Random ranGen = new Random(0);
We have changed all Math.random() invokations to ranGen.nextdouble(). Then we have obtained the same svm from weka libsvm than from libsvm train_svm.

However, weka accuracy results on primary_tumor data were still worse, so there's something wrong when weka uses the svm model at testing step.

Classes without instances

Arff format provides some meta-information (i.e. attributes name and type, set of possible values for nominal attributes), but libsvm format doesn't. So if there are classes in the dataset with zero occurrences through all the instances, libsvm thinks that these classes don't exist whereas Weka knows they exist.

For example, there is a class in primary tumor dataset that never appears. When weka experimenter makes testing, it calls to:
 public static double svm_predict_probability(svm_model model, svm_node[] x, double[] prob_estimates)
passing the array prob_estimates plenty of zeros (array cells are initialized to zero). The size of the array is equal to the number of classes (= 22). On the other hand, if this method is invoked from libsvm.svm_predict, the class that never appears is ignored, so the array dimension is now equal to 21.

So accuracy results are different depending on origin of svm_predict_probability method invocation. I think that better results are obtained if classes without instances are ignored, but I don't know if it is very fair. In fact, accuracies from weka.libsvm and from libsvm.predict_svm seem to be the same if the class that never appears is removed from arff file.

Note that this problem only appears when testing, because the training code uses always the svm_group_classes method to compute the number of classes, so Instances.numClasses() value is never used for training. Moreover, maybe the mismatch between the training number of classes and the testing number of classes is the reason behind worse accuracy results when svm_predict_probability invocation is made from weka, but I haven't proved it yet.

Note that this problem does also happen when you have a class with less examples than the number of folds. For some folds, the class will not have training examples.

We also made a quick fix for this problem:
  1. Add this public method to libsvm.svm_model class
    public int getNr_class(){return nr_class;}
  2. Make the following changes into distributionforInstance Method at weka.classifiers.functions.LibSVM
    First line of the method:
 int[] labels = new int[instance.numClasses()];
  • could be changed to
 int[] labels = new int[((svm_model) m_Model).getNr_class()];
  • Last line in "if(m_ProbablityEstimates)" block:
 prob_estimates = new double[instance.numClasses()];
  • could be changed to
 prob_estimates = new double[((svm_model) m_Model).getNr_class()];