Monday, 3 September 2018

Introduction to Classification - The Spam Filter

A spam filter is a simple example of statistical classification, putting data into known categories.

Classification is a case of supervised learning - where a training set is provided with correctly identified observations. An algorithm that performs classification is called a classifier.

A simple class of classification algorithms is called Naive Bayes, which has been studied extensively since the 1950s.

Naive Bayes (NB) is a popular choice for text classification e.g. is this a novel versus is this a poem or an essay. It is also used in word sense disambiguation.

Discriminative Models (aka Conditional Models)

Discriminative models are also known as conditional models.  They are used in Machine Learning to model the dependence of unobserved variables on observed variables, modeled probabilistically using P(y|x), where y is the unobserved variable vector, and x is the observed variable vector.

What is Nonparametric Statistics?

"Conventional" statistics uses distributions and parameters like mean and variance.

"Nonparametric" statistics relies on being "distribution free" or using unspecified distribution parameters.

"Support Vector Machines" are a form of nonparametric statistics useful in machine learning. It is a "discriminative" classifier. 

Sunday, 19 August 2018

Analytic Number Theory

Ah, so you've heard of the Riemann zeta function, have you not, and no doubt you will want to start programming with it, yes?  If so, read on!

To know it in its contemporary form, you should have a basic knowledge of complex analysis, including Cauchy's theorem and contour integration.   (Note that Euler studied this function earlier without complex analysis but his analysis was limited to the R-z as a function of real variable, rather than a complex variable).

A flavour of the subject can be found here and here (the latter addresses analytic number theory under the headline of multiplicative number theory). Prepare to be dazzled by the spectral theory of automorphic forms.

Tuesday, 15 May 2018

The Los Alamos Background to the GSL

The GSL project was initiated in 1996 by Los Alamos physicists Mark Galassi and James Theiler. 

Friday, 23 March 2018

GNU Cim is Open Source Simula

Want to do some Simula? Try the GNU Cim compiler. It is written in C and produces C which is passed to a C compiler to be translated to machine code. (Simula was developed in the 1960s as a simulation language at the Norwegian Computing Center (Norsk Regnesentral) in Oslo. It had two incarnations, Simula I and Simula 67).

Monday, 26 February 2018

The Python Hackers Guide to Probability from the True Pioneer Himself and an Ode to Lesbesgue

A Python Hacker wanting to know probability from the founder of modern probability should read Foundations of the Theory of Probability by Kolmogorov. It is a monograph written "to give an axiomatic foundation for the theory of probability". Kolmogorov acknowledges a debt of gratitude to the Beauvais-born Monsieur Henri Lesbegue, specifically to "Lebesgue's theories of measure and integration" (due to the connection between measure of a set, and probability of an event).