Table of Contents

Sometimes one wants to binarize a nominal attribute of a certain dataset by grouping all values except the one of interest together as a negation of this value. E.g., in the weather data the outlook attribute, where sunny is of interest and the other values, rainy and overcast, are grouped together as not-sunny.

Original dataset:
 @relation weather
 
 @attribute outlook {sunny, overcast, rainy}
 @attribute temperature real
 @attribute humidity real
 @attribute windy {TRUE, FALSE}
 @attribute play {yes, no}
 
 @data
 sunny,85,85,FALSE,no
 sunny,80,90,TRUE,no
 overcast,83,86,FALSE,yes
 rainy,70,96,FALSE,yes
 rainy,68,80,FALSE,yes
 rainy,65,70,TRUE,no
 overcast,64,65,TRUE,yes
 sunny,72,95,FALSE,no
 sunny,69,70,FALSE,yes
 rainy,75,80,FALSE,yes
 sunny,75,70,TRUE,yes
 overcast,72,90,TRUE,yes
 overcast,81,75,FALSE,yes
 rainy,71,91,TRUE,no
Desired output:
 @relation weather-sunny-and-not_sunny
 
 @attribute outlook {sunny,not_sunny}
 @attribute temperature numeric
 @attribute humidity numeric
 @attribute windy {TRUE,FALSE}
 @attribute play {yes,no}
 
 @data
 sunny,85,85,FALSE,no
 sunny,80,90,TRUE,no
 not_sunny,83,86,FALSE,yes
 not_sunny,70,96,FALSE,yes
 not_sunny,68,80,FALSE,yes
 not_sunny,65,70,TRUE,no
 not_sunny,64,65,TRUE,yes
 sunny,72,95,FALSE,no
 sunny,69,70,FALSE,yes
 not_sunny,75,80,FALSE,yes
 sunny,75,70,TRUE,yes
 not_sunny,72,90,TRUE,yes
 not_sunny,81,75,FALSE,yes
 not_sunny,71,91,TRUE,no
The Weka filter NominalToBinary cannot be used directly, since it generates a new attribute for each value of the nominal attribute. As a postprocessing step one could delete all the attributes that are of no interest, but this is quite cumbersome.

The class on the other hand generates directly several ARFF out of a given one in the desired format.

Download