Wednesday 28 January 2009

Python Lists: Extend versus Append (and other common operations)

Extend and Append

Extend a list with another list: results in one merged list
Appending a list to a list: you are "extending" the list by one element, that element is a list

a.extend(a) -> same as a= a+a, or a = 2*a.

If you want to add a string to a list of strings, you ought to use append. If you use extend it will add each character.

Stacks and Queues

Python lists can be used as stacks using append() and pop(). To use a list as a queue, use append() and pop(0) to remove the first element pushed on to the list (i.e. the head of the queue).

Functional Primitives (with examples)

map, filter and reduce allow you to do functional programming in Python. For example, if you want to compute cubes between 1 and 10, you can use map as follows:

def cube(x): return x*x*x
map(cube, range(1,10))

map will a unary function onto a list, a binary function onto two lists, ..., n-ary function onto n-lists etc.

reduce is a nice way to "roll up" a list.

>>> def add(x,y): return x+y...>>> reduce(add, range(1, 101)) 5050

is easily verified mentally using arithmetic series. The add function shows the power of dynamic languages. It is a very simple function that exhibits run-time polymorphism. If add is called with integers it returns an integer, if it is called with lists it extends one list with another list (i.e. concatenates two lists). Beautiful.

Reversing Lists

list.reverse() -> reverse in-place
list[::-1] -> reverse, producing a new list. This uses the "extended slice notation".

Removing items from Lists

list.remove(x) - removes the first occurrence of x in the list, provided the element exists, else a ValueError exception is thrown. Be careful if you remove an element from within a loop. For example, suppose you have a list consisting of the letters m, a, n, g, o. If you naively write a loop to delete every one character element, you will end up removing alternate letters and i.e. m, n and o, because each time you remove an element, the elements re-index and the index pointer is fixed.

If you want to delete all one character elements from a list, ["apple", "m", "a", "n", "g", "o"] the easiest way is to create a new list and add all non-one character elements to it.

To remove duplicates, either push the list into the keys of a dictionary, or do:
a = list(set(a)) in Python 2.5 onwards.

Other Fun Stuff on Lists

list.count("cat") - count the number of times "cat" appears in the list ( exact string searched for, case-sensitive)

Slice Notation

fruits[1:2] will return a list of all elements from index 1 up to (but not including) fruits[2]. So len(fruits[1:2]) is 1.

Slice notation can also be used to copy lists e.g. g= f[:]. Otherwise you are just copying the references (g=f creates g as an alias to f). A change to one list will be reflected in the other list. Sort one list, it is sorted in the other list.

Sorting Lists

list.sort() will sort a list in place. sorted(list) will return a new sorted list.

Custom sorts are easy to program. For example, if we want to sort a list of strings by length then do: def len_compare(x,y): return len(x)-len(y). Then a.sort(len_compare) will sort the list in order of increasing length and a.sort(len_compare, reverse=True) will sort the list in order of decreasing length.

Generating Random Lists

a=[], for x in range(10): a.append(random.randint(1,10)) will generate a list of 10 random integers from 1 to 10. random.choice(a) will then select a number at random from a. random.shuffle() will shuffle the elements in the list. random.sample(a, 10) will do sampling without replacement of 10 elements from population a.

Lists of characters

To concatenate characters in a list into a string e.g a= ['m', 'a', 'n', 'g', 'o'], do: import string; string.join(a, '') i.e. join elements in the list using the null separator (otherwise it will insert spaces in between the characters).

Lock-step iteration in Python

Lock-step iteration

Parallel or lock-step iteration between 2 or more lists can be achieved using zip function. e.g.

import sys
spaghetti = [ 1,2,5,3 ] meatballs = [ 2,3,4,5 ] mozarella = [ 3,4,5,6 ]

def main(argv):
for t in zip(spaghetti, meatballs, mozarella): print t

main(sys.argv)


The enumerate method on Lists

a=['apple'. 'orange']
for i,v in enumerate(a):
print i, v

Iterating a List in Reverse

for i in reversed(a):
print i

reversed(a) returns a listreverseiterator object. This brings us to the following question.

How does iteration work in Python?

The for statement calls the iter() method on the container object. The iter() method returns an iterator object that defines the next() method which accesses elements in the container one at a time. When there are no more elements, next() raises a StopIteration exception which tells the loop to terminate.

e.g. r = reversed(a); r.next()

Sunday 25 January 2009

Python 2.6 Good Stuff

import json

http://json.org/ (JavaScript object notation). This is a data format with two basic data structures: dictionary and list. PyDoc for this is here. The original module was written by Bob Ippolito (Mochi Media Tech-Master).

from mulitprocessing import Process, Queue

The multiprocessing module is pretty exciting. It allows Python programs to create new processes and return results to parent. It started out as an exact emulation of the threading module but the goal was dropped. The fundamental class is Process which is passed a callable object and a collection of arguments.

import ast

Provides an abstract syntax tree of Python code. t = ast.parse(""" some code here """). print ast.dump(t). Useful for code analyzers.

Better support for the Secure Sockets Layer (import ssl). The module is built on top of OpenSSL. Use it to write secure web servers.

There are lots of enhancements to other modules too, e.g. the turtle module. (Turtle was part of the Logo programmining language of 1966).

Check out some cool papers by Logo-master Seymour Papert or some articles from GDmag.

Thursday 8 January 2009

Markov Chain Compression with LZMA

Based off LZ77 and LZ78 described here. Thanks Jacob Ziv. Unfortunately LZMA is slow to compress, though compression ratio beats bzip.

Until the next new compression algo is discovered, catch up on some spectral analysis.