Sunday 22 January 2012

NLP rocks in Python!

Get on the NLTK Bandwagon!

NLTK is the natural language toolkit. There is an elementary introduction (written by David Mertz) on the IBM developerWorks website.

Long live computational linguistics in Python! Hoo-rah! Now let's start. import nltk. No wait, hold on.

Necessary Pre-requisites
  • PyAML is a necessary pre-requisite for NLTK (it is a YAML (read JSON subset) parser and emitter).Else you will get an annoying: "ImportError: No module named yaml" when you try to python setup.py install on nltk.
  • matplotlib is needed to use the super-cool [text].dispersion_plot() function! (needs at least numpy version 1.1 (I've installed 1.6). Warning: matplotlib installer is about 100MB. On windows it needs MSVC runtime.

Cool Features of NLTK!
  • Collocation extraction: via [text].collocations()
  • Enumerate Concordances: via [text].concordance("test_word") -> doesn't seem to work for phrases though e.g. "chief mate" doesn't jive but "chief" is alright

No comments: