Section 2 on learning based filters (Last session)
Batch and Online Spam Filter Comparison; Gordon Cormack and Andrej Bratko. Gordon offers a plug for Trec ;)  Read the paper, I can’t summarise this one! Captivating presentation by Gordon on the good bad and ugly comparison of DMC PPM Bogofilter LR and SVN classifiers. Possibly the best of the conference so far.
Online Discriminative Spam Filter Training; Joshua Goodman, Wen-tau Yih. (Microsoft). Joshua shows that using a simple discriminative method can be used ‘online’ and would be a valuable component of staked classifiers.
Learning at Low False Positive Rates; Wen-Tau Yih, Joshua Goodman, Geoff Hulton. Applying a combination of training with utility with two stage filtering for both Naive bayes and Logistic regression and targeting it to the low false positive regions during training.
Fast Uncertainty Sampling for Labeling Large E-mail Corpora;Â Richard Segal, Ted Markowitz, William Arnold. Richard (IBM, ~80% spam) preesents on the ideal test corpora up front on this one. Feedback loops freshness and bias features of Corpora all appear on the first slide. “Active learning” is their process of asking a human as few questions as possible in order to classify a corpus to a set accuracy, based on “the best m messages” . This is commonly known an uncertainty sampling which is commonly high cost. They propose a simple method of approximate uncertainty sampling in the paper which is far less costly and almost as worthy. Future work will use stacked classifiers.
Phew! Great conference, huge population of academic attendees and content. Well organized and a very capable venue.


No Comment Received
Leave A Reply