Introduction

Multi-instance (MI) classification is a supervised learning technique, but differs from normal supervised learning:
  • it has multiple instances in an example
  • only one class label is observable for all the instances in an example

Classifiers

Multi-instance classifiers were originally available through a separate software package, Multi-Instance Learning Kit (= MILK). But due to the introduction of the relational attribute in the ARFF format, they became part of Weka in version 3.5.3 (developer version only). These classifiers can now be found in the following package:
 weka.classifiers.mi

Data format

The data format for multi-instance classifiers is fairly simple:
  • bag-id - nominal attribute; unique identifier for each bag
  • bag - relational attribute; contains the instances of an example
  • class - the class label for the examples

Weka offers two filters to convert from flat file format (or propositional format), which is normally used in supervised classification, to multi-instance format and vice versa:
  • weka.filters.unsupervised.attribute.PropositionalToMultiInstance
  • weka.filters.unsupervised.attribute.MultiInstanceToPropositional

Here is an example of the musk1 UCI dataset, used quite often in publications covering MI learning (Note: ... denotes omission):
  • propositional format:
    This ARFF file lists all the attributes, molecule_name (which is the bag-id), f1 to f166 (containing the actual data of the instances) and the class attribute.
 @relation musk1
 
 @attribute molecule_name {MUSK-jf78,MUSK-jf67,MUSK-jf59,...,NON-MUSK-199}
 @attribute f1 numeric
 @attribute f2 numeric
 @attribute f3 numeric
 @attribute f4 numeric
 @attribute f5 numeric
 ...
 @attribute f166 numeric
 @attribute class {0,1}
 
 @data
 MUSK-188,42,-198,-109,-75,-117,11,23,-88,-28,-27,...,48,-37,6,30,1
 MUSK-188,42,-191,-142,-65,-117,55,49,-170,-45,5,...,48,-37,5,30,1
 ...
  • multi-instance format:
    Using the relational attribute, one only has three attributes on the first level: molecule_name, bag and class. The relational attribute contains the instances for each example, consisting of the attributes f1 to f166. The data of the relational attribute is surrounded by quotes and the single instances inside the bag are separated by line-feeds (= \n).
 @relation musk1
 
 @attribute molecule_name {MUSK-jf78,MUSK-jf67,MUSK-jf59,...,NON-MUSK-199}
 @attribute bag relational
   @attribute f1 numeric
   @attribute f2 numeric
   @attribute f3 numeric
   @attribute f4 numeric
   @attribute f5 numeric
   ...
   @attribute f166 numeric
 @end bag
 @attribute class {0,1}
 
 @data
 MUSK-188,"42,-198,-109,-75,-117,11,23,-88,-28,-27,...,48,-37,6,30\n42,-191,-142,-65,-117,55,49,-170,-45,5,...,48,-37,5,30\n...",1
 ...

See also


Links

  • Xin Xu. Statistical learning in multiple instance problem. Master's thesis, University of Waikato, Hamilton, NZ, 2003. 0657.594. Download
  • MILK homepage