In the following you'll find some information about the MessageClassifier from the 2nd edition of the Data Mining book by Witten and Frank.

Source code

Depending on the version of the book, download the corresponding version (this article is based on the 2nd edition):

Compiling

  • compile the source code like this, if the weka.jar is already in your CLASSPATH environment variable:
 javac MessageClassifier.java
  • otherwise, use this command line (of course, replace /path/to/ with the correct path on your system):
 javac -classpath /path/to/weka.jar MessageClassifier.java
Note: The classpath handling is omitted from here on.

Training

If you run the MessageClassifier for the first time, you need to provide labeled examples to build a classifier from, i.e., messages ("-m") and the corresponding classes ("-c"). Since the data and the model are kept for future use, one has to specify a filename, where the MessageClassifier is serialized to ("-t").

Here's an example, that labels the message email1.txt as miss:
 java MessageClassifier -m email1.txt -c miss -t messageclassifier.model
Repeat this for all the messages you want to have classified.

Classifying

Classifying an unseen message is quite straight-forward, one just omits the class option ("-c"). The following call
 java MessageClassifier -m email1023.txt -t messageclassifier.model
will produce something like this:
 Message classified as : miss