Saturday 24 October 2009

LOG-R using numpy (and why it's worth a code jam)

What

"In statistics, logistic regression (sometimes called the logistic model or logit model) is used for prediction of the probability of occurrence of an event by fitting data to a logistic curve". This is a pretty useless definition (from wikipedia) of a pretty useful concept. Logistic models are used extensively to model growth. Of what, you may well ask. Answer: anything. Business, market share, fish in a pond, if you have a growth process, and ESPECIALLY if the rates of growth are not constant through time (or space for that matter) logistic regression can potentially be applied to model the process.

How

Jeffrey Whitaker has written a module for logistic regression (back in 2006) that uses numpy. You might think the implementation is quite complex but at its heart is just the ubiquitous Newton-Raphson method used to find square roots.

Why

Why produce logistic regressions using this convoluted programming method. Are there not softwares that make this easier? The answer is yes and no. One can analyse data in OpenOffice and fit regression lines. However, some data sets that have "logistic shapes" such as learning curves cannot be modelled correctly in OpenOffice. The regressions supported include linear, logarithmic (function of lnx), exponential (function of e^x) and power regression (y=b*x^a) which will not model logistic data accurately.

Sunday 27 September 2009

Lamenting the Deprecation of a Brilliantly, Simple Feature: String Exceptions in Python

Python used to allow string exceptions (raise "Exception thrown!") but now these are deprecated. You have to raise real exceptions. BaseException is used as base for built-in exceptions.

Friday 4 September 2009

Solved by Bell Labs

What is R and why do I need it

R is a statistical computing language developed at Bell Laboratories. It can do classical hypothesis testing, time-series analysis and regression analysis.

It is based on the S language developed by ACM-award-winning-genius John Chambers (his initials are aptly JMC) and team, for which he received an honorary PhD in mathematics from University of Waterloo in Ontario. John Chambers has also been President of the International Association for Statistical Computing (IASC).

A very good elementary tutorial on R can be found on r-tutor.com. Also good is R by example. Also see the blog by Tim Moertel on R.

R can also do boxplots and other nice graphs.

Python and R Integration

Since R has its intellectual history rooted in Bell Labs, it seems most apt that solutions for Python integration should also emanate from Bell Labs. In this spirit, take a look at Duncan Temple Lang's website. There is also the rpy project on sourceforge.

Saturday 11 July 2009

Background to Python functools module

To know the intellectual history of functools development read this (PEP309).
>>> import functools
>>> dir(functools)
['WRAPPER_ASSIGNMENTS', 'WRAPPER_UPDATES', '__builtins__', '__doc__', '__file__', '__name__', 'partial', 'update_wrapper', 'wraps']

Python: Old Style and New Style Classes and an Ode to Moses Schonfinkel

If I define a class C and function called static_method() and do C.static_method() the result is TypeError. static_method has become a special kind of method object, called an unbound-method object, not a plain function. When you call a method on an object, the first argument is self, and here self returns null. How does one avoid this problem?

The key difference is to understand the difference between function objects and the more general notion of "callables". "Mutation to unbound method objects" only happens to function objects not other "Callables". If we don't want mutation to occur, we must use a Callable that is not a function object, as follows:

class Callable:
def __init__(self, anycallable):
self.__call__ = anycallable


We create an instance of Callable which delegates calls to another Callable it holds.Class methods can then be implemented in the enclosing class as instances of Callable.

class ClassWithStaticMethod:
def staticMethod(name):
print "Hi there",name
staticMethod = Callable(Method)


Of course, an easier solution is to use the @staticmethod decorator to your function.

Lots of good stuff has been written by this man, Alex Martelli from Google, formerly of Open End AB, a Python-oriented software house in Gothenburg, Sweden. For example, check out this recipe on high-performance currying in Python. Haskell Curry (whose name the verb currying is derived) was a writer and teacher of mathematical logic and creator of the lambda calculus of combinatory logic. His intellectual predecessor was Moses Schonfinkel, a Russian logician, and the German mathematican David Hilbert.

Back to old style and new style classes in Python. For old style classes, type(instance variable x) returns "instance" as the type, whereas type(x) for new style classes returns __class__ attribute. To quote the Python documentation, "new-style classes were introduced in Python 2.2 to unify classes and types". Benefits of this approach include the ability to subclass most built-in types. Essentially, new style classes are old style classes with RTTI support.

Taken from the docs: "For compatibility reasons, classes are still old-style by default. New-style classes are created by specifying another new-style class (i.e. a type) as a parent class, or the “top-level type” object if no other parent is needed".

To understand more deeply what Guido alludes to when talking of type/class unification read his explanatory essay.

Thursday 11 June 2009

Just One Reason Why Python is Better than Java (textwrap)

Java core libraries do not support word wrap, although Apache Commons Lang project has a function to do this. Python, however, supports word wrap in its core libraries since version 2.3. See the textwrap module. Viva la Python.

Thursday 28 May 2009

Platform Independent Programming with Python

I am interested in platform-independent programming in Python, in particular file and directory manipulations.

os.path module implements functions on pathnames. This module uses the following modules in different systems i.e. posixpath, ntpath, macpath, os2emxpath.

py2exe

py2exe lets you run python programs as native, using an exe and a python???.dll.

Wednesday 6 May 2009

Python Programmatic Compilation

To ensure all your pyc's are up-to-date:

import compileall
compileall.compile_dir( 'code/'. force=True)

compileall is the module to byte-compile Python libraries. Useful in installers and for maintaining the freshnees of the "build" of your Python code.

Saturday 2 May 2009

Programming the Doomsday Rule

The Doomsday algorithm is designed for calculating the day of the week given a date. Here are the details.

Wednesday 22 April 2009

Python and Oracle Integration

cx_Oracle is a Python Database API 2.0-compliant extension module to connect to Oracle databases. Basics of Python Database API 2.0:

connect( parameters ) - returns a Connection object
Cursor objects - manage the context of a fetch operation
TPC - two phase commits (TPC should not be confused with two phase locking). TPC relates to distributed transaction management.

Sunday 29 March 2009

Bayesian Spam Filtering

naive bayes classifiers are being used for spam and rss feed filtering.

Sunday 22 February 2009

Triangle Test in Python

def in_triangle(p,a,b,c):
# pre: a,b,c must be in clockwise order
# post: true if p is in the triangle abc
return right_of(p,a,b) and right_of(p,b,c) and right_of(p,c,a)

Saturday 7 February 2009

Enumerated Types in Python

How do we do enumerated types work in Python? Well, there is no enum keyword as such but there are many ways to get the same effect.

red,green,blue=range(3)

You can also isolate your enums in a class. e.g.

class Colors:
red,green,blue=range(3)

Colors.red yields 0 and dir(Colors) yields meaningful results.
Useful for programming a finite state machine.

Friday 6 February 2009

Parsing Libraries in Python: htmllib and sgmllib

Various structure markup processing tools are available in Python. Amongst these are htmllib and sgmllib. There's also an xmllib.

Wednesday 4 February 2009

How Multiple Inheritance Works in Python

class Derived(Base1, Base2, Base3):
statement1
statement2

The resolution rule for class attribute references is as follows (depth-first left to right):
  1. if attribute not in Derived, check Base1. If not in Base1, recursively check base classes of Base1.
  2. if not in Base1, check Base2. und so weiter, und so weiter.

Mixin classes are classes designed to be combined with other classes via multiple inheritance. The Tkinter module uses mixins extensively (e.g. the Misc class used as a mixin by the root window and widget classes). Misc provides a large number of Tk and window related services.

Tuesday 3 February 2009

Wednesday 28 January 2009

Python Lists: Extend versus Append (and other common operations)

Extend and Append

Extend a list with another list: results in one merged list
Appending a list to a list: you are "extending" the list by one element, that element is a list

a.extend(a) -> same as a= a+a, or a = 2*a.

If you want to add a string to a list of strings, you ought to use append. If you use extend it will add each character.

Stacks and Queues

Python lists can be used as stacks using append() and pop(). To use a list as a queue, use append() and pop(0) to remove the first element pushed on to the list (i.e. the head of the queue).

Functional Primitives (with examples)

map, filter and reduce allow you to do functional programming in Python. For example, if you want to compute cubes between 1 and 10, you can use map as follows:

def cube(x): return x*x*x
map(cube, range(1,10))

map will a unary function onto a list, a binary function onto two lists, ..., n-ary function onto n-lists etc.

reduce is a nice way to "roll up" a list.

>>> def add(x,y): return x+y...>>> reduce(add, range(1, 101)) 5050

is easily verified mentally using arithmetic series. The add function shows the power of dynamic languages. It is a very simple function that exhibits run-time polymorphism. If add is called with integers it returns an integer, if it is called with lists it extends one list with another list (i.e. concatenates two lists). Beautiful.

Reversing Lists

list.reverse() -> reverse in-place
list[::-1] -> reverse, producing a new list. This uses the "extended slice notation".

Removing items from Lists

list.remove(x) - removes the first occurrence of x in the list, provided the element exists, else a ValueError exception is thrown. Be careful if you remove an element from within a loop. For example, suppose you have a list consisting of the letters m, a, n, g, o. If you naively write a loop to delete every one character element, you will end up removing alternate letters and i.e. m, n and o, because each time you remove an element, the elements re-index and the index pointer is fixed.

If you want to delete all one character elements from a list, ["apple", "m", "a", "n", "g", "o"] the easiest way is to create a new list and add all non-one character elements to it.

To remove duplicates, either push the list into the keys of a dictionary, or do:
a = list(set(a)) in Python 2.5 onwards.

Other Fun Stuff on Lists

list.count("cat") - count the number of times "cat" appears in the list ( exact string searched for, case-sensitive)

Slice Notation

fruits[1:2] will return a list of all elements from index 1 up to (but not including) fruits[2]. So len(fruits[1:2]) is 1.

Slice notation can also be used to copy lists e.g. g= f[:]. Otherwise you are just copying the references (g=f creates g as an alias to f). A change to one list will be reflected in the other list. Sort one list, it is sorted in the other list.

Sorting Lists

list.sort() will sort a list in place. sorted(list) will return a new sorted list.

Custom sorts are easy to program. For example, if we want to sort a list of strings by length then do: def len_compare(x,y): return len(x)-len(y). Then a.sort(len_compare) will sort the list in order of increasing length and a.sort(len_compare, reverse=True) will sort the list in order of decreasing length.

Generating Random Lists

a=[], for x in range(10): a.append(random.randint(1,10)) will generate a list of 10 random integers from 1 to 10. random.choice(a) will then select a number at random from a. random.shuffle() will shuffle the elements in the list. random.sample(a, 10) will do sampling without replacement of 10 elements from population a.

Lists of characters

To concatenate characters in a list into a string e.g a= ['m', 'a', 'n', 'g', 'o'], do: import string; string.join(a, '') i.e. join elements in the list using the null separator (otherwise it will insert spaces in between the characters).

Lock-step iteration in Python

Lock-step iteration

Parallel or lock-step iteration between 2 or more lists can be achieved using zip function. e.g.

import sys
spaghetti = [ 1,2,5,3 ] meatballs = [ 2,3,4,5 ] mozarella = [ 3,4,5,6 ]

def main(argv):
for t in zip(spaghetti, meatballs, mozarella): print t

main(sys.argv)


The enumerate method on Lists

a=['apple'. 'orange']
for i,v in enumerate(a):
print i, v

Iterating a List in Reverse

for i in reversed(a):
print i

reversed(a) returns a listreverseiterator object. This brings us to the following question.

How does iteration work in Python?

The for statement calls the iter() method on the container object. The iter() method returns an iterator object that defines the next() method which accesses elements in the container one at a time. When there are no more elements, next() raises a StopIteration exception which tells the loop to terminate.

e.g. r = reversed(a); r.next()

Sunday 25 January 2009

Python 2.6 Good Stuff

import json

http://json.org/ (JavaScript object notation). This is a data format with two basic data structures: dictionary and list. PyDoc for this is here. The original module was written by Bob Ippolito (Mochi Media Tech-Master).

from mulitprocessing import Process, Queue

The multiprocessing module is pretty exciting. It allows Python programs to create new processes and return results to parent. It started out as an exact emulation of the threading module but the goal was dropped. The fundamental class is Process which is passed a callable object and a collection of arguments.

import ast

Provides an abstract syntax tree of Python code. t = ast.parse(""" some code here """). print ast.dump(t). Useful for code analyzers.

Better support for the Secure Sockets Layer (import ssl). The module is built on top of OpenSSL. Use it to write secure web servers.

There are lots of enhancements to other modules too, e.g. the turtle module. (Turtle was part of the Logo programmining language of 1966).

Check out some cool papers by Logo-master Seymour Papert or some articles from GDmag.

Thursday 8 January 2009

Markov Chain Compression with LZMA

Based off LZ77 and LZ78 described here. Thanks Jacob Ziv. Unfortunately LZMA is slow to compress, though compression ratio beats bzip.

Until the next new compression algo is discovered, catch up on some spectral analysis.