Use+Weka+in+your+Java+code+2

Hi, I used Weka in my project. To do it, I implemented some methods "high level" (from Weka) to build and filter some Instances and I'ld like to share some with you.

**The files**
So, there are three files :
 * 1) The first one is just an example of ARFF files.
 * 2) The second, based on Use Weka in your Java Code, uses only the seen methods on this page.
 * 3) The last files contains some new methods to build and filter the Instances.



Weka_Use.java
Based on Use Weka in your Java Code uses five methods.


 * 1) //test,// is to try the four other methods.
 * 2) //buildInstancesP/N,// build an Instances from an other and selected the rows from Percent or Number.
 * 3) //learning,// to create the classifier.
 * 4) //evaluation,// used to evaluate the test Instances given with the classifier previously built.

Weka_ManageInstances.java
Contains some methods to build and filter the given Instances.

1. To convert a String into an ARFF files.
code format="java" //Convert a String which represents an ARFF file into an Instances.// //__@param__ arff// //String which represents an ARFF file.// //__@return__ The Instances from the ARFF String.// //__@throws__ IOException//
 * =====instancesFromString=====

public static Instances instancesFromString (String arff) throws IOException { StringReader reader = new StringReader(arff); Instances insts = new Instances (reader); if (insts.classIndex == -1) insts.setClassIndex(insts.numAttributes - 1); return insts; } code

2. Columns Selection (Attributes Selection).

 * =====attributSelection=====

code format="java" //Select some attributes from a given Instances.// //__@param__ data// //An Instances of the data.// //__@param__ option// //String which represents the attributes to remove.// //"1-4" | "28" | "1-70,45,68-72" | "" | ...// //__@return__ The new Instances of data without undesired attributes.// //__@throws__ Exception//

public static Instances attributSelection (Instances data, String option) throws Exception { String[] options = new String[2]; options[0] = "-R"; options[1] = option; Remove remove = new Remove; remove.setOptions(options); remove.setInputFormat(data); Instances newData = Filter.useFilter(data, remove); if (newData.classIndex == -1) newData.setClassIndex(newData.numAttributes - 1); return newData; } code

3.1 Based on percent or number.

 * =====percentSelection=====

code format="java" //Used to choose some lines of data by indicating between what// //percents select the rows.// //__@param__ data// //An Instances of the data.// //__@param__ start// //Percent indicating the first line of the selection.// //__@param__ end// //Percent indicating the last line of the selection.// //__@return__ The new Instances of data with only the desired rows.// public static Instances percentSelection (Instances data, double start, double end) { if(end<start){ double temp = start; start = end; end = temp; } int to_start = (int) Math.round(data.numInstances * start); int to_end = Math.max ( (int) Math.round(data.numInstances * end) - to_start, 1);

Instances newData = new Instances(data, to_start, to_end); if (newData.classIndex == -1) newData.setClassIndex(newData.numAttributes - 1); return (newData); } code


 * =====rowNumberSelection=====

code format="java" //Select some lines of data by indicating between what// //line numbers choose the rows.// //__@param__ data// //An Instances of the data.// //__@param__ start// //Line number indicating the first line of the selection.// //__@param__ end// //Line number indicating the last line of the selection.// //__@return__ The new Instances of data with only the desired rows.//

public static Instances rowNumberSelection (Instances data, int start, int end) { if(end, <, = ] than the value.// //Ex: data = operatorSelection(data, 4, '>', -0.3);// //Every lines whose their values in the column number 4 are greater than -0.3.// //__@param__ data// //An Instances of the data.// //__@param__ attribute_index// //The index of attribute column to compare.// //__@param__ operator// //Used to choose how to compare.// //> | < | =// //__@param__ value// //The value used to compare.// //__@return__ The new Instances of data with only the desired rows.// //__@throws__ Exception//

public static Instances operatorSelection (Instances data, int attribute_index, char operator, double value) throws Exception { RemoveWithValues filter = new RemoveWithValues; if (attribute_index> data.numAttributes) attribute_index = data.numAttributes;

int current = 0; double epsilon = 0.001; String[] options = new String [4];

switch (operator){ case '>': options = new String[4]; value += epsilon; break; case '<': options = new String[5]; options[current++]= "-V"; break; case '=': //>=// //options = new String[4];// //options[current++] = "-C";// //options[current++] = String.valueOf(attribute_index);// //options[current++] = "-S";// //options[current++] = String.valueOf(value);//

//filter.setOptions(options);// //filter.setInputFormat(data);// //data = Filter.useFilter(data, filter);//

<= current =0; options = new String[5]; options[current++]= "-V"; value += epsilon; break; default: System.out.println("ERROR: Weka_ManageInstance, operatorSelection, unknow operator."); return data; }

options[current++] = "-C"; options[current++] = String.valueOf(attribute_index); options[current++] = "-S"; options[current++] = String.valueOf(value);

filter.setOptions(options); filter.setInputFormat(data); Instances newData = Filter.useFilter(data, filter); if (newData.classIndex == -1) newData.setClassIndex(newData.numAttributes - 1); return newData; } code

3.3 To avoid the redundancy.

 * =====differentNextSelection=====

code format="java" //Delete every lines followed by a row with the same values.// //Use equalsInstance.// //__@param__ data// //An Instances of the data.// //__@return__ The new Instances of data with only the desired rows.//

public static Instances differentNextSelection (Instances data) { Instances newData = data;

for(int index = newData.numInstances-1; index>0; index--) { Instance inst = newData.instance(index); Instance next = newData.instance(index-1); if (equalsInstance(inst, next)) newData.delete(index); } return (newData); } code

4. To add lines.

 * =====concatInstances=====

code format="java" //Return a concatenation of the given Instances.// //__@param__ inst1// //First Instances (head).// //__@param__ inst2// //Second Instances to add (tail).// //__@return__ ( inst1 ^ inst2 )// public static Instances concatInstances (Instances inst1, Instances inst2) { ArrayList instAL = new ArrayList; for (int i=0; i<inst2.numInstances; i++) instAL.add(inst2.instance(i)); for (int i=0; i<instAL.size; i++) inst1.add(instAL.get(i)); return (inst1); } code


 * =====addDifferentWithPrevious=====

code format="java" //Add the Instance at the end of the Instances if the last is different.// //__@param__ data// //An Instances of the data.// //__@return__ The new Instances of data with only the desired rows.// public static Instances addDifferentWithPrevious (Instances data, Instance inst) { Instances newData = data; if (!equalsInstance(data.instance(data.numInstances-1), inst)) newData.add(inst); return (newData); } code


 * =====addDifferentWithAll=====

code format="java" //Add the Instance at the end of the Instances if all instance are different.// //__@param__ data// //An Instances of the data.// //__@return__ The new Instances of data with only the desired rows.// public static Instances addDifferentWithAll (Instances data, Instance inst) { Instances newData = data;

for(int i=0; i<newData.numInstances; i++) { if (equalsInstance(data.instance(i), inst)) return newData; }

newData.add(inst); return (newData); } code


 * =====addDifferentWithAll_dontCareOfLastAtt=====

code format="java" //Used to add a new Instance in the learningInstances and replace// //an older one which got the same value for a different prediction/last attributes.// //__@param__ data// //The learningInstance.// //__@param__ inst// //The new instance to replace an older prediction.// //__@param__ addEvenIfNoSimilar// //Add inst at data even if there is no similar instance (not only replace).// //__@return__ The new learningInstances.// public static Instances addDifferentWithAll_dontCareOfLastAtt (Instances data, Instance inst, boolean addEvenIfNoSimilar) { Instances newData = data; int i = indexOfSame_dontCareOfLastAtt (newData, inst); if(i!=-1) { newData.delete(i); newData.add(inst); } else if(addEvenIfNoSimilar) newData.add(inst);

return newData; } code

5. To delete.

 * =====deleteInstance=====

code format="java" //Delete every Instance inst of Data.// //__@param__ data// //__@param__ inst// __//@return//__ public static Instances deleteInstance(Instances data, Instance inst) { Instances newData = data; int i=0; while(i!=-1 && data.numInstances>2) { i = indexOfInstance (newData, inst); if(i!=-1) newData.delete(i); } return newData; } code


 * =====deleteClosestInstance=====

code format="java" //Return an Instances without the closest Instance of inst.// //__@param__ data// //__@param__ inst// //__@param__ valueMinMax// //ArrayList of the value min and max taken by each Attribute of data.// //__@return__ The new data without the closest Instance of inst.//

public static Instances deleteClosestInstance (Instances data, Instance inst, ArrayList valueMinMax) { Instances newData = data; Instance instToDel = Weka_ManageInstances.getClosestInstance (newData, inst, valueMinMax); newData = Weka_ManageInstances.deleteInstance(newData, instToDel); return newData; } code


 * =====deleteClosestInstance=====

code format="java" //Return an Instances without the numberToDel closest Instance of inst.// //__@param__ data// //__@param__ inst// //__@param__ valueMinMax// //ArrayList of the value min and max taken by each Attribute of data.// //__@param__ numberToDel// //The number of Instance of data close to inst to delete.// //__@return__ The new data without the numberToDel closest Instance of inst.//

public static Instances deleteClosestInstance (Instances data, Instance inst, ArrayList valueMinMax, int numberToDel) { Instances newData = data; for(int i=0; inumberToDel+5 ; i++) { Instance instToDel = Weka_ManageInstances.getClosestInstance (newData, inst, valueMinMax); newData = Weka_ManageInstances.deleteInstance(newData, instToDel); } return newData; } code