MessageClassifier

toc

In the following you'll find some information about the MessageClassifier from the **2nd** edition of the [|Data Mining] book by Witten and Frank.

= Source code = Depending on the version of the book, download the corresponding version (this article is based on the **2nd** edition):
 * 1st Edition: [|MessageClassifier]
 * 2nd Edition: [|MessageClassifier] ([|book], [|stable-3.6], [|developer])

= Compiling = code format="bash" javac MessageClassifier.java code code format="java" javac -classpath /path/to/weka.jar MessageClassifier.java code
 * compile the source code like this, if the is already in your CLASSPATH environment variable:
 * otherwise, use this command line (of course, replace with the correct path on your system):
 * Note:** The classpath handling is omitted from here on.

= Training = If you run the for the first time, you need to provide labeled examples to build a classifier from, i.e., messages ("") and the corresponding classes (""). Since the data and the model are kept for future use, one has to specify a filename, where the is serialized to ("").

Here's an example, that labels the message //email1.txt// as //miss//: code format="bash" java MessageClassifier -m email1.txt -c miss -t messageclassifier.model code Repeat this for all the messages you want to have classified.

= Classifying = Classifying an unseen message is quite straight-forward, one just omits the class option (""). The following call code format="bash" java MessageClassifier -m email1023.txt -t messageclassifier.model code will produce something like this: code format="text" Message classified as : miss code