Generating+cross-validation+folds+(Filter+approach)

The filter (package ) can be used to generate the train/test splits used in cross-validation (for stratified folds, use ). The filter has to be used twice for each train/test split, first to generate the train set and then to obtain the test set.

Since this is rather cumbersome by hand, one can also put this into a [|bash] script: code format="bash" # # if [ ! $# -eq 2 ] then echo echo "usage: folds.sh  " echo exit 1 fi JAR=$1 DATASET=$2 FOLDS=10 FILTER=weka.filters.unsupervised.instance.RemoveFolds SEED=1 for ((i = 1; i <= $FOLDS; i++)) do  echo "Generating pair $i/$FOLDS..." OUTFILE=`echo $DATASET | sed s/"\.arff"//g` # train set java -cp $JAR $FILTER -V -N $FOLDS -F $i -S $SEED -i $DATASET -o "$OUTFILE-train-$i-of-$FOLDS.arff" # test set java -cp $JAR $FILTER   -N $FOLDS -F $i -S $SEED -i $DATASET -o "$OUTFILE-test-$i-of-$FOLDS.arff" done code The script expects two parameters:
 * 1) !/bin/bash
 * 1) expects the weka.jar as first parameter and the datasets to work on as
 * 2) second parameter.
 * 1) FracPete, 2007-04-10
 * 1) the  (or the path to the Weka classes)
 * 2) the dataset to generate the train/test pairs from

Example: code format="text" ./folds.sh /some/where/weka.jar /some/where/else/dataset.arff code

This example will create the train/test splits for a 10-fold cross-validation at the same location as the original dataset, i.e., in the directory.

= Downloads =
 * [[file:folds.sh]]