Friday, 29 June 2012

Before rpy - Read This

Before tackling the Python/R/Statistics triathlon, you may be interested in reading "R in Action" as reviewed on Revolutions.

Wednesday, 25 January 2012

lorenz.py demystified

First, an Ode to matplotlib (for providing lorenz.p)

lorenz.py is a Lorenz attractor implementation example from matplotlib.

Second, an Ode to Chaos Theory (Pioneer)

It is named after chaos theory pioneer, Edward Norton Lorenz, who formulated it in 1963 and is an example of a strange attractor of fractional dimension.

What is an attractor?

A set of physical properties towards which a system tends to evolve, regardless of the initial state.

What maketh an attractor strange?

An attractor is strange if it has a fractal structure.

Leading by Example

Technically, it is an example of a "non-linear dynamic system". A great example of how math gets translated into python.

Tuesday, 24 January 2012

nltk - Just the Basics

import nltk
nltk.download()

Spawns Edward Loper's TK download application (works better on Linux than Whine-dows)
Download book (about 100MB of space needed).

from nltk.book import *
text1.collocations() - high probability bigrams
text1.concordance("monstrous") - search with in-context results
text1.similar("monstrous") - words that appear in similar contexts
text1.common_contexts(["monstrous", "very"])

You can also go visualisation.

text1.dispersion_plot(["citizens", "democracy", "freedom", "duties", "America"])

Notes

  • You might need to hack and reinstall to get working in Python 2.5. Basically, os.walk can't use followLinks=true.

  • concordance can't process bigrams correctly

Sunday, 22 January 2012

NLP rocks in Python!

Get on the NLTK Bandwagon!

NLTK is the natural language toolkit. There is an elementary introduction (written by David Mertz) on the IBM developerWorks website.

Long live computational linguistics in Python! Hoo-rah! Now let's start. import nltk. No wait, hold on.

Necessary Pre-requisites
  • PyAML is a necessary pre-requisite for NLTK (it is a YAML (read JSON subset) parser and emitter).Else you will get an annoying: "ImportError: No module named yaml" when you try to python setup.py install on nltk.
  • matplotlib is needed to use the super-cool [text].dispersion_plot() function! (needs at least numpy version 1.1 (I've installed 1.6). Warning: matplotlib installer is about 100MB. On windows it needs MSVC runtime.

Cool Features of NLTK!
  • Collocation extraction: via [text].collocations()
  • Enumerate Concordances: via [text].concordance("test_word") -> doesn't seem to work for phrases though e.g. "chief mate" doesn't jive but "chief" is alright

nltk - Journey with the jar man

NLTK has some CRAZY (read: AWESOME) integration with automated theorem proving package: Prover9 (the successor of the Otter theorem prover, built on the "resolution concept" of mathematical logic). To understand resolution, one can read the incomprehensible wiki entry or the superior "wolfram" version, which PROPERLY attributes the technique to Yorkshire computer genius, John Alan Robinson, the "intellectual facilitator" of Prolog.