Saturday, 24 October 2009
LOG-R using numpy (and why it's worth a code jam)
"In statistics, logistic regression (sometimes called the logistic model or logit model) is used for prediction of the probability of occurrence of an event by fitting data to a logistic curve". This is a pretty useless definition (from wikipedia) of a pretty useful concept. Logistic models are used extensively to model growth. Of what, you may well ask. Answer: anything. Business, market share, fish in a pond, if you have a growth process, and ESPECIALLY if the rates of growth are not constant through time (or space for that matter) logistic regression can potentially be applied to model the process.
How
Jeffrey Whitaker has written a module for logistic regression (back in 2006) that uses numpy. You might think the implementation is quite complex but at its heart is just the ubiquitous Newton-Raphson method used to find square roots.
Why
Why produce logistic regressions using this convoluted programming method. Are there not softwares that make this easier? The answer is yes and no. One can analyse data in OpenOffice and fit regression lines. However, some data sets that have "logistic shapes" such as learning curves cannot be modelled correctly in OpenOffice. The regressions supported include linear, logarithmic (function of lnx), exponential (function of e^x) and power regression (y=b*x^a) which will not model logistic data accurately.
Sunday, 27 September 2009
Lamenting the Deprecation of a Brilliantly, Simple Feature: String Exceptions in Python
Friday, 4 September 2009
Solved by Bell Labs
R is a statistical computing language developed at Bell Laboratories. It can do classical hypothesis testing, time-series analysis and regression analysis.
It is based on the S language developed by ACM-award-winning-genius John Chambers (his initials are aptly JMC) and team, for which he received an honorary PhD in mathematics from University of Waterloo in Ontario. John Chambers has also been President of the International Association for Statistical Computing (IASC).
A very good elementary tutorial on R can be found on r-tutor.com. Also good is R by example. Also see the blog by Tim Moertel on R.
R can also do boxplots and other nice graphs.
Python and R Integration
Since R has its intellectual history rooted in Bell Labs, it seems most apt that solutions for Python integration should also emanate from Bell Labs. In this spirit, take a look at Duncan Temple Lang's website. There is also the rpy project on sourceforge.
Saturday, 11 July 2009
Background to Python functools module
>>> import functools
>>> dir(functools)
['WRAPPER_ASSIGNMENTS', 'WRAPPER_UPDATES', '__builtins__', '__doc__', '__file__', '__name__', 'partial', 'update_wrapper', 'wraps']
Python: Old Style and New Style Classes and an Ode to Moses Schonfinkel
The key difference is to understand the difference between function objects and the more general notion of "callables". "Mutation to unbound method objects" only happens to function objects not other "Callables". If we don't want mutation to occur, we must use a Callable that is not a function object, as follows:
class Callable:
def __init__(self, anycallable):
self.__call__ = anycallable
We create an instance of Callable which delegates calls to another Callable it holds.Class methods can then be implemented in the enclosing class as instances of Callable.
class ClassWithStaticMethod:
def staticMethod(name):
print "Hi there",name
staticMethod = Callable(Method)
Of course, an easier solution is to use the @staticmethod decorator to your function.
Lots of good stuff has been written by this man, Alex Martelli from Google, formerly of Open End AB, a Python-oriented software house in Gothenburg, Sweden. For example, check out this recipe on high-performance currying in Python. Haskell Curry (whose name the verb currying is derived) was a writer and teacher of mathematical logic and creator of the lambda calculus of combinatory logic. His intellectual predecessor was Moses Schonfinkel, a Russian logician, and the German mathematican David Hilbert.
Back to old style and new style classes in Python. For old style classes, type(instance variable x) returns "instance" as the type, whereas type(x) for new style classes returns __class__ attribute. To quote the Python documentation, "new-style classes were introduced in Python 2.2 to unify classes and types". Benefits of this approach include the ability to subclass most built-in types. Essentially, new style classes are old style classes with RTTI support.
Taken from the docs: "For compatibility reasons, classes are still old-style by default. New-style classes are created by specifying another new-style class (i.e. a type) as a parent class, or the “top-level type” object if no other parent is needed".
To understand more deeply what Guido alludes to when talking of type/class unification read his explanatory essay.
Thursday, 11 June 2009
Just One Reason Why Python is Better than Java (textwrap)
Thursday, 28 May 2009
Platform Independent Programming with Python
os.path module implements functions on pathnames. This module uses the following modules in different systems i.e. posixpath, ntpath, macpath, os2emxpath.
Wednesday, 6 May 2009
Python Programmatic Compilation
import compileall
compileall.compile_dir( 'code/'. force=True)
compileall is the module to byte-compile Python libraries. Useful in installers and for maintaining the freshnees of the "build" of your Python code.
Saturday, 2 May 2009
Programming the Doomsday Rule
Wednesday, 22 April 2009
Python and Oracle Integration
connect( parameters ) - returns a Connection object
Cursor objects - manage the context of a fetch operation
TPC - two phase commits (TPC should not be confused with two phase locking). TPC relates to distributed transaction management.
Sunday, 29 March 2009
Saturday, 28 March 2009
Sunday, 22 February 2009
Triangle Test in Python
# pre: a,b,c must be in clockwise order
# post: true if p is in the triangle abc
return right_of(p,a,b) and right_of(p,b,c) and right_of(p,c,a)
Saturday, 7 February 2009
Enumerated Types in Python
red,green,blue=range(3)
You can also isolate your enums in a class. e.g.
class Colors:
red,green,blue=range(3)
Colors.red yields 0 and dir(Colors) yields meaningful results.
Useful for programming a finite state machine.
Friday, 6 February 2009
Parsing Libraries in Python: htmllib and sgmllib
Wednesday, 4 February 2009
How Multiple Inheritance Works in Python
statement1
statement2
The resolution rule for class attribute references is as follows (depth-first left to right):
- if attribute not in Derived, check Base1. If not in Base1, recursively check base classes of Base1.
- if not in Base1, check Base2. und so weiter, und so weiter.
Mixin classes are classes designed to be combined with other classes via multiple inheritance. The Tkinter module uses mixins extensively (e.g. the Misc class used as a mixin by the root window and widget classes). Misc provides a large number of Tk and window related services.
Tuesday, 3 February 2009
Wednesday, 28 January 2009
Python Lists: Extend versus Append (and other common operations)
Extend a list with another list: results in one merged list
Appending a list to a list: you are "extending" the list by one element, that element is a list
a.extend(a) -> same as a= a+a, or a = 2*a.
If you want to add a string to a list of strings, you ought to use append. If you use extend it will add each character.
Stacks and Queues
Python lists can be used as stacks using append() and pop(). To use a list as a queue, use append() and pop(0) to remove the first element pushed on to the list (i.e. the head of the queue).
Functional Primitives (with examples)
map, filter and reduce allow you to do functional programming in Python. For example, if you want to compute cubes between 1 and 10, you can use map as follows:
def cube(x): return x*x*x
map(cube, range(1,10))
map will a unary function onto a list, a binary function onto two lists, ..., n-ary function onto n-lists etc.
reduce is a nice way to "roll up" a list.
>>> def add(x,y): return x+y...>>> reduce(add, range(1, 101)) 5050
is easily verified mentally using arithmetic series. The add function shows the power of dynamic languages. It is a very simple function that exhibits run-time polymorphism. If add is called with integers it returns an integer, if it is called with lists it extends one list with another list (i.e. concatenates two lists). Beautiful.
Reversing Lists
list.reverse() -> reverse in-place
list[::-1] -> reverse, producing a new list. This uses the "extended slice notation".
Removing items from Lists
list.remove(x) - removes the first occurrence of x in the list, provided the element exists, else a ValueError exception is thrown. Be careful if you remove an element from within a loop. For example, suppose you have a list consisting of the letters m, a, n, g, o. If you naively write a loop to delete every one character element, you will end up removing alternate letters and i.e. m, n and o, because each time you remove an element, the elements re-index and the index pointer is fixed.
If you want to delete all one character elements from a list, ["apple", "m", "a", "n", "g", "o"] the easiest way is to create a new list and add all non-one character elements to it.
To remove duplicates, either push the list into the keys of a dictionary, or do:
a = list(set(a)) in Python 2.5 onwards.
Other Fun Stuff on Lists
list.count("cat") - count the number of times "cat" appears in the list ( exact string searched for, case-sensitive)
Slice Notation
fruits[1:2] will return a list of all elements from index 1 up to (but not including) fruits[2]. So len(fruits[1:2]) is 1.
Slice notation can also be used to copy lists e.g. g= f[:]. Otherwise you are just copying the references (g=f creates g as an alias to f). A change to one list will be reflected in the other list. Sort one list, it is sorted in the other list.
Sorting Lists
list.sort() will sort a list in place. sorted(list) will return a new sorted list.
Custom sorts are easy to program. For example, if we want to sort a list of strings by length then do: def len_compare(x,y): return len(x)-len(y). Then a.sort(len_compare) will sort the list in order of increasing length and a.sort(len_compare, reverse=True) will sort the list in order of decreasing length.
Generating Random Lists
a=[], for x in range(10): a.append(random.randint(1,10)) will generate a list of 10 random integers from 1 to 10. random.choice(a) will then select a number at random from a. random.shuffle() will shuffle the elements in the list. random.sample(a, 10) will do sampling without replacement of 10 elements from population a.
Lists of characters
To concatenate characters in a list into a string e.g a= ['m', 'a', 'n', 'g', 'o'], do: import string; string.join(a, '') i.e. join elements in the list using the null separator (otherwise it will insert spaces in between the characters).
Lock-step iteration in Python
Parallel or lock-step iteration between 2 or more lists can be achieved using zip function. e.g.
import sys
spaghetti = [ 1,2,5,3 ] meatballs = [ 2,3,4,5 ] mozarella = [ 3,4,5,6 ]
def main(argv):
for t in zip(spaghetti, meatballs, mozarella): print t
main(sys.argv)
The enumerate method on Lists
a=['apple'. 'orange']
for i,v in enumerate(a):
print i, v
Iterating a List in Reverse
for i in reversed(a):
print i
reversed(a) returns a listreverseiterator object. This brings us to the following question.
How does iteration work in Python?
The for statement calls the iter() method on the container object. The iter() method returns an iterator object that defines the next() method which accesses elements in the container one at a time. When there are no more elements, next() raises a StopIteration exception which tells the loop to terminate.
e.g. r = reversed(a); r.next()
Sunday, 25 January 2009
Python 2.6 Good Stuff
import json
http://json.org/ (JavaScript object notation). This is a data format with two basic data structures: dictionary and list. PyDoc for this is here. The original module was written by Bob Ippolito (Mochi Media Tech-Master).
from mulitprocessing import Process, Queue
The multiprocessing module is pretty exciting. It allows Python programs to create new processes and return results to parent. It started out as an exact emulation of the threading module but the goal was dropped. The fundamental class is Process which is passed a callable object and a collection of arguments.
import ast
Provides an abstract syntax tree of Python code. t = ast.parse(""" some code here """). print ast.dump(t). Useful for code analyzers.
Better support for the Secure Sockets Layer (import ssl). The module is built on top of OpenSSL. Use it to write secure web servers.
There are lots of enhancements to other modules too, e.g. the turtle module. (Turtle was part of the Logo programmining language of 1966).
Check out some cool papers by Logo-master Seymour Papert or some articles from GDmag.
Thursday, 8 January 2009
Markov Chain Compression with LZMA
Until the next new compression algo is discovered, catch up on some spectral analysis.