Datasets

= Overview =
 * A jarfile containing 37 classification problems, originally obtained from the [|UCI repository] ([|datasets-UCI.jar], 1,190,961 Bytes).
 * A jarfile containing 37 regression problems, obtained from various sources ([|datasets-numeric.jar], 169,344 Bytes).
 * A jarfile containing 6 agricultural datasets obtained from agricultural researchers in New Zealand ([|agridatasets.jar], 31,200 Bytes).
 * A jarfile containing 30 regression datasets collected by Luis Torgo ([|regression-datasets.jar], 10,090,266 Bytes).
 * A gzip'ed tar containing [|UCI] and [UCI KDD] datasets ([|uci-20050214.tar.gz], 15,308,385 Bytes)
 * A gzip'ed tar containing [|StatLib] datasets ([|statlib-20050214.tar.gz], 12,785,582 Bytes)
 * A gzip'ed tar containing ordinal, real-world datasets donated by Dr. Arie Ben David (Holon Inst. of Technology/Israel) ([|datasets-arie_ben_david.tar.gz], 11,348 Bytes)
 * A zip file containing 19 multi-class (1-of-n) text datasets donated by [|George Forman]/[|Hewlett-Packard Labs] ([|19MclassTextWc.zip], 14,084,828 Bytes)

= Other datasets in ARFF format =
 * [|Protein data sets], maintained by [|Shuiwang Ji], [|CS Department, Louisiana State University]
 * [|Kent Ridge Biomedical Data Set Repository], maintained by [|Jinyan Li] and Huiqing Liu, [|Institute for Infocomm Research], Shanghai

= Links =
 * [|UCI]
 * [|UCI KDD]
 * [|StatLib]
 * [|Weka on SourceForge]