The mathematics of AI (machine learning mathematically)

What is machine learning?
▶ Task Identify handwritten digits
▶ We can see this as a function in the following way:
▶ Convert the pictures into grayscale values, e.g. 28 × 28 grid of numbers
▶ Flatten the result into a vector, e.g. 28 × 28 7→ a vector with 282
= 784 entries
▶ The output is a vector with 10 entries
▶ We thus have a function R784
→ R10
The mathematics of AI Or: Learning = forward, loss, backward April 2024 2 / 5

= 784 entries
→ R10
Input example
Output example

= 784 entries
→ R10
Task – rephrased
We have a function R784
→ R10
How can we describe this function?

= 784 entries
→ R10
Task – rephrased
We have a function R784
→ R10
How can we describe this function?
Machine/deep learning (today’s topic)
tries to answer questions of this type
letting a computer detect patterns in data
Crucial In ML the computer performs tasks without explicit instructions

,
▶ Idea Approximate the unknown function R784
→ R10
▶ Neural network = a piecewise linear approximation (matrices + PL maps)
▶ The matrices = a bunch of numbers (weights) and offsets (biases)
▶ The PL maps = usually ReLU

Forward
→
Loss
→
Backward
→ → → → ...
▶ Machine learning mantra
▶ Forward = calculate an approximation (start with random inputs)
▶ Loss = compare to real data
▶ Backward = adjust the approximation

What is a neural network (nn)?
▶ NN = a directed graph as above
▶ The task of a nn is to approximate an unknown function
▶ It consist of neurons = entries of vectors, and weights = entries of matrices
The mathematics of AI Or: Learning = forward, loss, backward April 2024 π / 5

Example
Here we have two matrices, a 3-by-1 and a 2-by-3 matrix


w
x
y

 and

a11 a12 a13
a21 a22 a23

plus two bias terms as in y = matrix · x + bias

Example
Here we have four matrices (plus four biases), whose composition gives a map
R3
→ R4
→ R3
→ R3
→ R2

Actually...
we need nonlinear maps as well, say ReLU applied componentwise
Here we have four matrices, whose composition gives a map
R3 ReLU◦matrix
−
−
−
−
−
−
−
→ R4 ReLU◦matrix
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
→ R2
But ignore that for now
ReLU doesn’t learn anything, its just brings in nonlinearity



a1
11 b1
1
a1
12 b1
2
a1
13 b1
3

 ,

a2
11 a2
12 a2
13 b2
1
a2
21 a2
22 a2
23 b2
2

▶ The ak
ij and bk
i are the parameters of our nn
▶ k = number of the layer
▶ Deep = many layers = better approximation



a1
11 b1
1
a1
12 b1
2
a1
13 b1
3

 ,

a2
11 a2
12 a2
13 b2
1
a2
21 a2
22 a2
23 b2
2

▶ The ak
ij and bk
The point
Many layers → many parameters
These are good for approximating real world problems
Examples
ResNet-152 with 152. layers (used in transformer models such as ChatGPT)
VGG-19 with 19 layers (used in image classification)
GoogLeNet with 22 layers (used in face detection)



a1
11 b1
1
a1
12 b1
2
a1
13 b1
3

 ,

a2
11 a2
12 a2
13 b2
1
a2
21 a2
22 a2
23 b2
2

▶ The ak
ij and bk
Side fact
Gaming has improved AI!?
A GPU can do e.g. matrix multiplications faster
than a CPU and lots of nn run on GPUs

How learning works
▶ Supervised learning Create a dataset with answers, e.g. pictures of
handwritten digits plus their label
▶ There are other forms of learning e.g. unsupervised, which I skip
▶ Split the data into ≈80% training and ≈20% testing data

How learning works
Idea to keep in mind
How to train students?
There are lectures, exercises etc., the training data
There is a final exam, the testing data
Upon performance, we let them out into the wild

How learning works
▶ Forward Run the nn = function on the training data
▶ Loss Calculate the difference “results - answers” (⇒ loss function)
▶ Backward Change the parameters trying to minimize the loss function
▶ Repeat

How learning works
▶ Repeat
Forward
Boils down to a bunch of matrix multiplications
followed by the nonlinear activation e.g. ReLU

How learning works
▶ Repeat
Loss
The difference between real values and predictions
Task Minimize loss function

How learning works
▶ Repeat
Backward
This is running gradient descent on the loss function
Slogan Adjust parameters following the steepest descent

How learning works
▶ Repeat
And what makes it even better:
You can try it yourself
My favorite tool is PyTorch but there are also other methods
Let us see how!

= 784 entries
→ R10
= 784 entries
→ R10
Input example
Output example
,
→ R10
Forward
→
Loss
→
Backward
→ → → → ...
Example


w
x
y

 and

a11 a12 a13
a21 a22 a23

How learning works
How learning works
▶ Repeat
Forward
How learning works
▶ Repeat
Loss
How learning works
▶ Repeat
Backward
There is still much to do...

= 784 entries
→ R10
= 784 entries
→ R10
Input example
Output example
,
→ R10
Forward
→
Loss
→
Backward
→ → → → ...
Example


w
x
y

 and

a11 a12 a13
a21 a22 a23

How learning works
How learning works
▶ Repeat
Forward
How learning works
▶ Repeat
Loss
How learning works
▶ Repeat
Backward
Thanks for your attention!

The mathematics of AI (machine learning mathematically)

More Related Content

Similar to The mathematics of AI (machine learning mathematically) (20)

More from Daniel Tubbenhauer (11)

Recently uploaded (20)

The mathematics of AI (machine learning mathematically)