Sunday 8 May 2016

Topology for Top Thinkers (Particularly Pythonistas)

Topology is an important topic to understand for non-mathematicians as well as mathematicians. For example.

Pythonistas needing to program mathematics employing topological concepts can benefit enormously from understanding topology.  This might become increasingly important with applications of topology to physics and biological sciences.

It deals with properties of space unimpacted by continuous deformations e.g. stretching and bending.

Gottfried Leibniz was already thinking along the lines of a topological science as early as the 17th century with his "geometria situs" (Latin for geometry of place).

Today, there are many subfields of topology, including one intriguingly named differential topology, which deals with differential functions on differentiable manifolds.

Knot theory is another interesting aspect of topology is used to study the effect of certain enzymes on DNA.

A knot is an embedding of a circle in three dimensional Euclidean space. Note that the word "embedding" (also known as an "imbedding") has a very specific, technical meaning in mathematics.

Thursday 7 April 2016

Basics of R - Including how to specify path names in R

If you don't know R, or are coming back at R after a long break, the following FAQ may be a good refresh.

Is R case-sensitive?

Yes, most certainly - R is case-sensitive. However, many functions are lower case.

Example of lower case functions include getwd() and setwd("putDirectoryNameHere").

Is R interpreted or compiled?

R is an interpreted language.

How do I create a data vector in R?

Creating a data vector is one of the most fundamental operations in R.  You do so using the c() function, and assignment operator.  For example, creating a data vector of the numbers 1, 2, 3, 4 and 5 can be done using:

myVector < – c( 1, 2, 3, 4, 5)

How do I calculate correlation given two data vectors a and b?

cor(a, b)

How do I import an R script into the current session?

Use the source function.  Example: source("myScript.R").

How  do I specify pathnames in R?

Use forward slash instead of backslash for path names. Backslash is used to specify escape characters.

Can I run R "unattended"?

Yes, there is a batch mode in R, using the arguments CMD BATCH.

Wednesday 17 February 2016

MADlib - Big Data Machine Learning the Apache Way

MADlib is a machine learning library for data scientists. The MAD stands for "Magnetic, Agile and Deep". The concept is doing big data analytics in the database. Another key principle of MADlib is leverage of MPP share nothing architectures, first elucidated on by Michael Stonebraker at University of California, Berkeley. Joe Hellerstein at UCB is a big promoter of MADlib.