Writing+your+own+Filter+(up+to+3.5.3)

toc = General = For general information on writing filters, see the Writing your own Filter article as well.

= Choosing the superclass = The base filters and interfaces are all located in the following package: code format="text" weka.filters code One can basically distinguish between two different kinds of filters: > they need to see the whole dataset before they can start processing it, which they do in one go > they can start producing output right away and the data just passes through while being modified
 * **batch filters**
 * **stream filters**

With Weka releases later than 3.5.2 (or directly from Subversion) you can subclass one of the following abstract filters, depending on the kind of classifier you want to implement: These filters simplify the rather general and complex framework introduced by the abstract superclass. One only needs to implement a couple of abstract methods that will process the actual data and override, if necessary, a few existing methods for option handling.

SimpleBatchFilter
Only the following abstract methods need to be implemented: > returns a short description of what the filter does; will be displayed in the GUI > generates the new format, based on the input data > processes the whole dataset in one go

If more options are necessary, then the following methods need to be overridden: > returns an enumeration of the available options; these are printed if one calls the filter with the //-h// option > parses the given option array, that were passed from commandline > returns an array of options, resembling the current setup of the filter

In the following an **example implementation** that adds an additional attribute at the end, containing the index of the processed instance: code format="java" import weka.core.*; import weka.filters.*; public class SimpleBatch extends SimpleBatchFilter { public String globalInfo { return  "A simple batch filter that adds an additional attribute 'bla' at the end " + "containing the index of the processed instance."; }  protected Instances determineOutputFormat(Instances inputFormat) { Instances result = new Instances(inputFormat, 0); result.insertAttributeAt(new Attribute("bla"), result.numAttributes); return result; }  protected Instances process(Instances inst) { Instances result = new Instances(determineOutputFormat(inst), 0); for (int i = 0; i < inst.numInstances; i++) { double[] values = new double[result.numAttributes]; for (int n = 0; n < inst.numAttributes; n++) values[n] = inst.instance(i).value(n); values[values.length - 1] = i;      result.add(new Instance(1, values)); }    return result; }  public static void main(String[] args) { try { if (Utils.getFlag('b', args)) Filter.batchFilterFile(new SimpleBatch, args); else Filter.filterFile(new SimpleBatch, args); }     catch (Exception e) { e.printStackTrace; }  } } code

SimpleStreamFilter
Only the following abstract methods need to be implemented: > returns a short description of what the filter does; will be displayed in the GUI > generates the new format, based on the input data
 * processes a single instance and turns it from the old format into the new one

The method is only used, since the random number generator needs to be re-initialized in order to obtain repeatable results.

If more options are necessary, then the following methods need to be overridden: > returns an enumeration of the available options; these are printed if one calls the filter with the //-h// option > parses the given option array, that were passed from commandline > returns an array of options, resembling the current setup of the filter

In the following an **example implementation** of a stream filter that adds an extra attribute at the end, which is filled with random numbers: code format="java" import weka.core.*; import weka.filters.*; import java.util.Random; public class SimpleStream extends SimpleStreamFilter { protected Random m_Random; public String globalInfo { return  "A simple stream filter that adds an attribute 'bla' at the end " + "containing a random number."; }  protected void reset { super.reset; m_Random = new Random(1); }  protected Instances determineOutputFormat(Instances inputFormat) { Instances result = new Instances(inputFormat, 0); result.insertAttributeAt(new Attribute("bla"), result.numAttributes); return result; }  protected Instance process(Instance inst) { double[] values = new double[inst.numAttributes + 1]; for (int n = 0; n < inst.numAttributes; n++) values[n] = inst.value(n); values[values.length - 1] = m_Random.nextInt; Instance result = new Instance(1, values); return result; }  public static void main(String[] args) { try { if (Utils.getFlag('b', args)) Filter.batchFilterFile(new SimpleStream, args); else Filter.filterFile(new SimpleStream, args); }     catch (Exception e) { e.printStackTrace; }  } } code A **real-world** implementation of a stream filter is the (package ), which passes the data through all the filters it contains. Depending on whether all the used filters are streamable or not, it acts either as a stream filter or as batch filter.

= Internals = Some useful fields of the filter classes: > if an instance of the filter was just instantiated via  or a new batch was started via the  method. > as soon as the first batch was finished via the  method. Useful for //supervised// filters, which should not be altered after being trained with the first batch of instances.

= See also =
 * Writing your own Filter
 * Writing your own Filter (post 3.5.3)
 * Writing your own Filter (default)