Maths is Back and it is Python: July 2025

Wednesday, 9 July 2025

Revisiting Map Reduce

Map reduce is very prevalent in distributed data processing.

It's an idea from functional programming. The basic idea is you have a map function that can apply a filter to a list, and then apply a summary operation, which is the reduce.

A Mickey Mouse example to bring this to life could be - you have list of fishing vessels, you want to filter for foreign flags, and then you want to add them all up, and you want to do this every 6 hours and create a time series database of this data.

A popular open source implementation is Apache Hadoop.

Thursday, 3 July 2025

The Autoregressive Nature of LLM Operations

LLMs are AUTOREGRESSIVE generative models.

An autoregression is a regression of a variable against its own lagged values.

For example, an AR(1) predicts the current value based on the immediately preceding value, AR(n) uses the n most recent values.

One remarkable feature of these models is in-context learning, which has been hypothesised as being Bayesian in nature.

The Python Datasets library

Hugging Face has a Python datasets library which has natural language training data sets amongst others.

Maths is Back and it is Python

Wednesday, 9 July 2025

Revisiting Map Reduce

Thursday, 3 July 2025

The Autoregressive Nature of LLM Operations

The Python Datasets library

MD5

Labels

Blog Archive

About Me

My Blog List