Monday, 3 September 2018

Introduction to Classification - The Spam Filter

A spam filter is a simple example of statistical classification, putting data into known categories.

Classification is a case of supervised learning - where a training set is provided with correctly identified observations. An algorithm that performs classification is called a classifier.

A simple class of classification algorithms is called Naive Bayes, which has been studied extensively since the 1950s.

Naive Bayes (NB) is a popular choice for text classification e.g. is this a novel versus is this a poem or an essay. It is also used in word sense disambiguation.

No comments: