Modeling Identity in Archival Collections of Email: A Preliminary Study; Tamer Elsayed, Douglas Oard. More nodal analysys of the Emron corpus using address-name/nickname-address associations in headers and quoted headers. 96.7% entity accuracy is pretty impressive especialy with the weakest evidance. I have to wonder what else this could be used for if they can move to machine learning techiques rather than the hand tuned process
Introducing the Webb Spam Corpus: Using Email Spam to Identify Web Spam Automatically; Steve Webb, James Caverlee, Calton Pu – Steve Webb on expanding the definition of web spam outward to include the headers as comments in html and all the redirect meta data too. Based on the spamarchive data and keeping a corpus fresh with data and the metadata and using that to mine out the “spam” from linkfarms.
Annotating Subsets of the Enron Email Corpus; Jade Goldstein, Andres Kwasinksi, Paul Kingsbury, Roberta Evans Sabin, Albert McDowell – Correlating the voice transcripts of 98 Emron calls with the emron corpus, attempting to relate topics on the wider scale.
An Exploratory Study of the W3C Mailing List Test Collection for Retrieval of Emails with Pro/Con Argument; Yejun Wu, Douglas Oard, Ian Soboroff. Yejun discusses the w3c.org collection parsed from the website html. They assign relevance to topics and catagories though in conclusion they propose using SVM as an alternate clasification method. I’d like to see more on this one in a years time.


No Comment Received
Leave A Reply