NLTK is the natural language toolkit. There is an elementary introduction (written by David Mertz) on the IBM developerWorks website.
Long live computational linguistics in Python! Hoo-rah! Now let's start. import nltk. No wait, hold on.
Necessary Pre-requisites
- PyAML is a necessary pre-requisite for NLTK (it is a YAML (read JSON subset) parser and emitter).Else you will get an annoying: "ImportError: No module named yaml" when you try to python setup.py install on nltk.
- matplotlib is needed to use the super-cool [text].dispersion_plot() function! (needs at least numpy version 1.1 (I've installed 1.6). Warning: matplotlib installer is about 100MB. On windows it needs MSVC runtime.
Cool Features of NLTK!
- Collocation extraction: via [text].collocations()
- Enumerate Concordances: via [text].concordance("test_word") -> doesn't seem to work for phrases though e.g. "chief mate" doesn't jive but "chief" is alright
No comments:
Post a Comment