Writing+your+own+Filter+(post+3.5.3)

toc
 * Note:** This is also covered in chapter //Extending WEKA// of the WEKA manual in versions later than 3.6.1/3.7.0 or snapshots of the stable-3.6/developer version later than 10/01/2010.

=General= For general information on writing filters, see the Writing your own Filter article as well.

=Choosing the superclass= The base filters and interfaces are all located in the following package: code weka.filters code One can basically distinguish between two different kinds of filters: > they need to see the whole dataset before they can start processing it, which they do in one go > they can start producing output right away and the data just passes through while being modified
 * **batch filters**
 * **stream filters**

With Weka releases later than 3.5.2 (or directly from Subversion) you can subclass one of the following abstract filters, depending on the kind of classifier you want to implement: These filters simplify the rather general and complex framework introduced by the abstract superclass. One only needs to implement a couple of abstract methods that will process the actual data and override, if necessary, a few existing methods for option handling.

Additionally, filters have been upgraded to become like the classifiers. I.e., one has to override the method to return a suitable configured object.

SimpleBatchFilter
Only the following abstract methods need to be implemented: > returns a short description of what the filter does; will be displayed in the GUI > generates the new format, based on the input data > processes the whole dataset in one go > returns the Subversion revision information, see section Revisions
 * - only for Weka >3.5.7

If you need access to the full input dataset in, then you need to also override the method and make it return true.

If more options are necessary, then the following methods need to be overridden: > returns an enumeration of the available options; these are printed if one calls the filter with the //-h// option > parses the given option array, that were passed from commandline > returns an array of options, resembling the current setup of the filter

In the following an **example implementation** that adds an additional attribute at the end, containing the index of the processed instance: code format="java" import weka.core.*; import weka.core.Capabilities.*; import weka.filters.*;

public class SimpleBatch extends SimpleBatchFilter {

public String globalInfo { return  "A simple batch filter that adds an additional attribute 'bla' at the end " + "containing the index of the processed instance."; }

public Capabilities getCapabilities { Capabilities result = super.getCapabilities; result.enableAllAttributes; result.enableAllClasses; result.enable(Capability.NO_CLASS); //// filter doesn't need class to be set// return result; }

protected Instances determineOutputFormat(Instances inputFormat) { Instances result = new Instances(inputFormat, 0); result.insertAttributeAt(new Attribute("bla"), result.numAttributes); return result; }

protected Instances process(Instances inst) { Instances result = new Instances(determineOutputFormat(inst), 0); for (int i = 0; i < inst.numInstances; i++) { double[] values = new double[result.numAttributes]; for (int n = 0; n < inst.numAttributes; n++) values[n] = inst.instance(i).value(n); values[values.length - 1] = i;      result.add(new Instance(1, values)); }    return result; }

public static void main(String[] args) { runFilter(new SimpleBatch, args); } } code

SimpleStreamFilter
Only the following abstract methods need to be implemented: > returns a short description of what the filter does; will be displayed in the GUI > generates the new format, based on the input data > returns the Subversion revision information, see section Revisions
 * processes a single instance and turns it from the old format into the new one
 * - only for Weka >3.5.7

The method is only used, since the random number generator needs to be re-initialized in order to obtain repeatable results.

If more options are necessary, then the following methods need to be overridden: > returns an enumeration of the available options; these are printed if one calls the filter with the //-h// option > parses the given option array, that were passed from commandline > returns an array of options, resembling the current setup of the filter

In the following an **example implementation** of a stream filter that adds an extra attribute at the end, which is filled with random numbers: code format="java" import weka.core.*; import weka.core.Capabilities.*; import weka.filters.*;

import java.util.Random;

public class SimpleStream extends SimpleStreamFilter {

protected Random m_Random;

public String globalInfo { return  "A simple stream filter that adds an attribute 'bla' at the end " + "containing a random number."; }

public Capabilities getCapabilities { Capabilities result = super.getCapabilities; result.enableAllAttributes; result.enableAllClasses; result.enable(Capability.NO_CLASS); //// filter doesn't need class to be set// return result; }

protected void reset { super.reset; m_Random = new Random(1); }

protected Instances determineOutputFormat(Instances inputFormat) { Instances result = new Instances(inputFormat, 0); result.insertAttributeAt(new Attribute("bla"), result.numAttributes); return result; }

protected Instance process(Instance inst) { double[] values = new double[inst.numAttributes + 1]; for (int n = 0; n < inst.numAttributes; n++) values[n] = inst.value(n); values[values.length - 1] = m_Random.nextInt; Instance result = new Instance(1, values); return result; }

public static void main(String[] args) { runFilter(new SimpleStream, args); } } code A **real-world** implementation of a stream filter is the (package ), which passes the data through all the filters it contains. Depending on whether all the used filters are streamable or not, it acts either as a stream filter or as batch filter.

=Internals= Some useful methods of the filter classes: > returns if an instance of the filter was just instantiated via  or a new batch was started via the  method. > returns as soon as the first batch was finished via the  method. Useful for //supervised// filters, which should not be altered after being trained with the first batch of instances.

=See also=
 * Writing your own Filter
 * Writing your own Filter (up to 3.5.3)
 * Writing your own Filter (default)