Recently I read "Machine Learning for Email" published from O'reilly. Thanks to R programming language, the reader could concentrate on the main purpose to understand the core procedures related to machine learning. Because the author explains the codes precisely also, the readers could understand the technologies clearly even if they can't understand some part of the codes.
Regarding the introduced machine learning methods, they are just basic statistical methods. Some of the readers having experience working with machine learning prior may feel a little tired. However, the introduced approach is enough to classify spam and ham.
In addition to the classification of spam and ham, this book introduced a way how to rank emails with many practical idea.
- if a period an user sends the response after viewing is short, it would be important email for him.
- if a period an user interacts with a thread is long, the thread would be important for him. Therefore, the terms included in the thread are ranked as high.
Through the book, because the sample codes use practical sample email data which can be obtained from the web, the introduced machine learning methods address practical use case though simple.
Shared publicly