Mr PyMath: 2008

Wednesday 31 December 2008

Fun Maths

http://math.furman.edu/~mwoodard/mqs/mquot.shtml

http://nrich.maths.org/public/monthindex.php

Monday 29 December 2008

Numerical Methods on Multiple Machines

Linear algebra routines can be parallelized and run on a grid using the theory of parallel algorithms.
http://www.cs.utexas.edu/users/plapack/papers/ipps98/ipps98.html

This is Cannons algorithm for matrix multiplication.
http://orca.st.usm.edu/~seyfarth/sc730/cannon.html

MonteCarlo computation is also a good candidate for parallel processing.

The following Java programs (for computational physics) are not parallelised but are interesting nonetheless.

http://www.physics.unlv.edu/~pang/cp2_j.html

Wednesday 24 December 2008

Programming Stochastic Processes - including HMMs

An excellent course on stochastic processes from Technical University of Denmark is here.

Lecture 12 explains Hidden Markov Models.

A lot of work is done to build the foundations of DTMC and CTMC (discrete- and continuous- time Markov chains).

Another great book on stochastic processes with lots of example applications (e.g. from physics and electrical engineering) is by Emmanuel Parzen at Texas A&M University.

A particularly interesting anecdote from the book is how Einstein's equation involving the Wiener process was used to deduce the Avogadro number from Brownian motion experiments. The Avogadro number is the number of particles (atoms or molecules) in on mole of a substance.

Knuth Volume 2 (Seminumerical Algorithms) describes the basic stochastic simulation techniques.

Vol2 also contains an unusually deep analysis of Euclid's algorithm to compute gcd(one of the oldest known algorithms) and prime factorisation.

Tuesday 23 December 2008

SAGE and Cython - Mathematics in Python

SAGE is an open source mathematics system with a Python-based interface (95% of the code is Python).

http://www.sagemath.org/download-source.html

You will see in the source code to SAGE that there are .pxd files, used to declare functions and classes in a C-extension module. This is a Cython convention. Cython makes it easy to write C-extensions for Python.

Sunday 23 November 2008

Math and CS Research Departments

Princeton

Some notable contemporary names at Princeton include Brian Kernighan (of awk fame - Aho, Weinberger and Kernighan) and Andrew Appel. BK is also involved in a very interesting project called AMPL a modelling language for mathematical programming.

Princeton is also famous as the place Alonzo Church, the American mathematician, did his PhD in mathematics and published a paper in 1936 on an "undecidable problem" which introduced the lambda calculus that inspired the creation of the LISP programming language.

http://www.cs.princeton.edu/

Inria

Inria have published an interesting series of monographs for 2008, including the use of temporal logic as a verification tool.
http://www.inria.fr/publications/editeurs/edit_monographies.fr.html

UNSW

Project on compiling Haskell to Java.
http://www.cse.unsw.edu.au/~pls/thesis-topics/ghcjava.html

Friday 21 November 2008

Project Martingale (Java API for Derivatives Pricing)

http://martingale.berlios.de/Martingale.html#code

Sample signatures: public class BrownianMotion extends StochasticProcess; public class DigitalRandomSequence extends LowDiscrepancySequence (methods to compute L2-discrepancy). Utilises JFreeChart created by Object Refinery based ici.

Saturday 15 November 2008

Basic Calculus and Vector Calculus Concepts for Computer Programmers

The MIT website outlines the basic rules of differentiation, which should be second-nature to any mathematical programmer.

It revises rules like product rule (to differentiate xln(x), breaking the product xln(x) into the sum of two products), chain rule (to differentiate sin(ln(x)) and quotient rule (to differentiate (ln(x)/cos(x)).

Programmers should also have a basic knowledge of limits. Like...what is the limit of sinx as x tends to zero? (you can guess this one from the graph). More tricky, what is the limit of sinx/x as x tends to zero? Don't know? Find out here.

Concepts of vector calculus (a.k.a. vector analysis, which has its roots in the theory of quaternions) like scalar and vector fields and grad (or "nabla"), div ("divergence") and curl should also be clearly understood and examples done. The link above also covers basic 2D and 3D co-ordinate systems (i.e. cylindrical and spherical coordinates versus cartesian and polar co-ordinates) essential in graphics programming.

An example of a scalar field would be temperature in a room, where a scalar temperature (in Kelvins say) can be associated to every 3D vector in the room. An example of a vector field would be an electrical force field where each vector has a magnitude and direction in which the force is acting.

Puzzle: is light a vector or a scalar field? Why?

Friday 14 November 2008

K-Maps

I was reading about black-box testing when I came across a technique involving Karnaugh maps
(K-maps, or Veitch diagrams). These optimisation diagrams were invented by American mathematician Edward Veitch in 1952 and further refined by Maurice Karnaugh (BSc, Msc, PhD in Physics) at Bell Labs (now part of Alcatel-Lucent). In K-maps, boolean variables are taken (e.g. from a truth table) and ordered by Gray Code (aka reflected binary code) principles (where successive members of the code differ by only one digit).

Here is a software that creates K-Maps: http://www.sharewareplaza.com/images/screenshot/44948.gif

Tuesday 11 November 2008

Functional Analysis, Operational Calculus and Operator Theory

Banach spaces feature prominently in courses on functional analysis, including theoretical numerical analysis. Stefan Banach initially referred to these as "spaces of type B".

Functional analysis is basically the study of vector spaces and the operators that act upon them. The word "functional" originates from the calculus of variations, which deals in higher-order functions, or functions whose arguments are functions.

Here is a history on the related domain of Operator Theory.

The article makes mention of Oliver Heaviside's operational calculus of which a short summary can be found here. Oliver Heaviside was born in Camden Town, London in 1850 (near what is now Mornington Crescent) and worked in Newcastle as a telegraph operator. Interested in telegraphy, mathematics and electrical engineering, he quit his job and studied full-time from home. He invented transmission line theory and developed techniques equivalent to Laplace transforms for solving differential equations.

Heaviside read Maxwell's famous treatise on electricity and magnetism as a young man. He writes: "I saw that it was great, greater and greatest with prodigious possibilities in its power. I was determined to master the book and set to work...it took me several years to understand as much as I possibly could. Then I set Maxwell aside and followed my own course..I progressed much more quickly".

Oliver Heaviside was the first recipient of the Faraday Medal in 1922 awarded by the Institute of Engineering and Technology.

Bonjour, Monsieur Martingale!

Here is a conversation with Joseph ("Joe") Leo Doob, the American mathematician from Ohio famous for his martingale convergence theorems. http://www.dartmouth.edu/~chance/Doob/conversation.html

What motivated Joe to develop deep results in applied probability? In the interview, Joe reveals it was his desire to convert common probabilistic intuition into mathematics (statements and proofs) that drove him to discover his theorems.

The wikipedia entry on martingales is also interesting. Doob's soiree into martingale theory coincided with the Great Depression of 1929.

Other notable statistical activity at that time included Kolmogorov's axiomatisation of probability in 1933. Read Doob's bio here.

Sunday 2 November 2008

Math Jargon for November

November's math jargon:

supremum, sup - simply a jargonista way of saying "least upper bound"
sufficient statistic - concept introduced by Fisher in 1920, refers to a function of a data set that summarises all the sample information in that set. want more precision, and a chat about Rao-Blackwellisation? Go here.
Goldbach conjecture - if n>4 is even, then n is the sum of two odd primes (can be rephrased: even numbers >4 are the sum of two odd primes). The conjecture is named after Prussian mathematician Christian Goldback.

Math-programmers are also well-advised to know ALL the greek letters used in mathematics (commonly forgotten or under-used ones are eta, kappa, nu, upsilon (looks like nu), psi).

Friday 31 October 2008

Object Oriented Numerics Libraries

Nice index of libraries
http://www.oonumerics.org/oon/

Sunday 26 October 2008

The Art of Mathematics

An interesting BBC video on dynamical systems, created by Lasse Rempe of the University of Liverpool.

http://news.bbc.co.uk/1/hi/sci/tech/7617191.stm

An interesting introduction to dynamical systems is David Luenberger's Introduction to Dynamical Systems: Theory, Models and Applications. (David is a professor of management science and engineering at Stanford and has published lots of articles in journals like "IEEE Transactions on Automatic Control" and "SIAM Journal on Control and Optimization").

Here is an interesting edition of the IEEE Automatic Control journal focusing on Systems Biology.

http://ieeexplore.ieee.org/xpl/tocresult.jsp?isYear=2008&isnumber=4439798&Submit32=Go+To+Issue

Thursday 23 October 2008

The GSL

http://www.gnu.org/software/gsl/

GNU Scientific Library can do FFTs, solve differential equations and least squares fitting amongst other things.

Monday 29 September 2008

Numerical Programming using Haskell and the Glasgow Haskell Compiler

ghc 6.8.3 - the Glasgow Haskell (Optimising) Compiler. Downloaded from haskell.org. Requires more memory than Hugs but programs it produces run much faster. GHC's developer website contains all the bugs/proposals etc present in GHC. Haddock, a doc-generator for Haskell is worth checking out. Pleasantly, the Windows installer adds ghc.exe to your PATH.

Haskell is one of the purest functional languages around. It tries to avoid using state and mutable data, and instead works by evaluating expressions. This lack of state can make it a difficult language for implementing numerical algorithms.

Still -there's no harm trying! Here some interesting papers on this subject - including one that creates some functional codes for quantum mechanics. Also worth keeping tabs on Haskell's mathematics library listage.

Wednesday 24 September 2008

The Use of Super in Python

Read THIS "super" article from Artima.

And the NIFTY vs HARMFUL argument HERE.

Thursday 3 July 2008

Cryptography and the Swedish Connection

This post relates to a question that arose as I was debugging a Python program using pdb. I wrote a small program which was spending most of its time inside sre_compile.py (via pickle.py -> re.py -> sre_compile.py). I looked inside sre_compile.py and found this was "Secret Labs'" regular expression library. So who are Secret Labs and do they have anything to do with cryptography?

This website is the home of Secret Labs AB. They are based in Sweden and have been working with Python since 1997. AB stands for Aktiebolag ("stock company"). Did you know SAAB is actually short for SA Aktiebolag?

Wednesday 18 June 2008

Cryptographic Checksums

Cryptographic checksums are simply hash functions which have been deemed "secure" according to certain criteria. In association with public key algorithms they are used to produce digital signatures. Here is a canonical program utilising the MD5 checksum in PyCrypto:

from Crypto.Hash import MD5
m =MD5.new()
m.update("MD5 is strengthened MD4")
print "checksum: " + m.digest()

Read the RFC written by Ron Rivest. A quick guide to cryptographic hash functions is on Columbia University's website.

SHA is a secure hash function designed for use with the Digital Signature Standard (check out the DSS official specification). It's easy to use in PyCrypto and produces slightly longer digests (160 bits instead of the standard 128 bits of MD2..MD5). Here's the sample code:

from Crypto.Hash import SHA
m =SHA.new()
m.update("SHA is more resistant to brute force attacks than MD5")
print "checksum: " + m.digest()

Tuesday 17 June 2008

AES - The Wide Trail Wonder

The Advanced Encryption Standard, also known as Rijndael, was developed by two Belgian cryptographers and succeeds DES. The code for encrypting using AES in Python is almost identical as for DES, except that instead of a 64 bit key (8 bytes) the key size must be 16, 24 or 32 bytes long.

The creators of Rijndael published a book in 2002 explaining the mathematical arguments underlying its construction (Design of Rijndael, Joan Daemen and Vincent Rijmen, Springer). In it they describe the "wide trail strategy" used in the algorithm's design.

Here is a paper explaining wide trail written by the creators of Rijndael. Read it.

Monday 16 June 2008

DES - Defending against Differential Cryptanalysis

DES is a symmetric encryption algorithm developed in the 1970s by IBM based off an algorithm called Lucifer, and was adopted as a federal standard in 1976. The first step is to encrypt using DES is to create a DES object.

obj=DES.new('abc', DES.MODE_ECB)

fails with the following ValueError: Key must be 8 bytes long not 3. In fact, DES uses a 56-bit key length but 64 bits must be specified with every 8th bit reserved for parity checking. A handful of numbers are considered weak keys (example: alternating 1s and 0s). We have selected to use DES in ECB (electronic codebook) mode (note that a cipher mode should not compromise the security of the underlying algorithm). The DES package offers other modes including CBC, CFB, OFB and PGP. The other thing we need to know about DES is it is a block cipher encrypting data in 64 bit blocks as opposed to a stream cipher which encrypts data one bit (or one byte) at a time. Right, now assume we have changed the key to 8 bytes (e.g. abcdefgh). We would like to encrypt a message.

data="the rain in hawaii stays mainly in mount waialeale"
ciphertext=obj.encrypt(data)

This also results in a ValueError because we are not obeying the data restrictions specified by DES, namely that "input strings must be a multiple of 8 in length". This is because DES operates on blocks of 64 bits. We can easily scale up the data to be a multiple of 64 bits in size as follows.

if (len(data)%64)!=0): data += "x" * (64-len(data))

If you now do obj.encrypt on the data and print the results you will see the message beautifully encrypted.

To understand how this algorithm works unfortunately we cannot just use the Python debugger. I have installed the source in C:\Python25\pycrypto-2.0.1\src\DES.c. (you will also see DES3.c for Triple-DES, AES.c for Advanced Encryption Standard and ARC4 for Alleged RC4 algorithm in this directory).

DES operates on the principles of confusion and diffusion, described in Claude Shannon's "Communication Theory of Secrecy Systems" (1949). Confusion aims to muddy the relationship between plaintext and ciphertext, using techniques like subsitution in the Caesar Cipher (where every letter is substituted with another). Diffusion aims to spread redundancy of the plaintext through the ciphertext. One way to achieve this is via transposition, rearranging the letters in the plaintext.

DES comprises 16 rounds of substitution/permutations based off the 56 bit key.

Some specific details are available from IBM Research, including how DES deals with differential cryptanalysis.

PyCrypto installed!

I have installed the Windows binaries for PyCrypto version 2.0.1 into my C:\Python25 directory. I figured I needed a real cryptography library after using the deprecated Enigma-machine rotor library in the past which is no longer part of the Python distribution. To check PyCrypto has installed correctly, open an interactive session:

>>> from Crypto.Cipher import DES

As long as you don't get an ImportError you're ready to roll.