Meri Leeworthy

Multi-layer feed-forward neural network

A network with one or more hidden layers of neurons (perceptrons). Each hidden layer accepts inputs only from the layer immediately preceding it and passes outputs only to the layer immediately succeeding it.

We don’t use the step function in Multilayer FFNNs, and instead use a differentiable function with a smooth output like the sigmoid for the activation function.

The output units are a function of the hidden units which in turn are a function of the input units:

Expressiveness of Multilayer FFNNs

Using sigmoid instead of step means Multilayer FFNNs can be used to create arbitrarily complex functions.

Loss function

The loss function for our networks is the square of the difference between the predicted value at each of the output nodes and the true value expected at the node.

We want to minimise the loss function so our learning task is to find the set of weights for which the loss is minimised. We use the gradient descent algorithm to find those weights. The gradient of the loss function concerning a specific weight within the network tells us the direction we should update that weight.

$$\frac{\delta}{\delta \mathbf{w}}Loss(\mathbf{w}) = \frac{\delta}{\delta \mathbf{w}}\sum_{k=1}^m(y_k -a_k)^2 = \sum_{k=1}^m\frac{\delta}{\delta \mathbf{w}}(y_k -a_k)^2 $$

if y is the expected output, a is the predicted output - so the derivative of the square of that difference is the gradient (rate of change) of the weights vector for that perceptron. The derivative of the sum is equal to the sum of derivatives - see the Sum Rule

To actually compute the gradient we use backpropagation

I live and work on the land of the Wurundjeri people of the Kulin Nation. I pay respect to their elders past and present and acknowledge that sovereignty was never ceded. Always was, always will be Aboriginal land.

This site uses open source typefaces, including Sligoil by Ariel Martín Pérez, and Vercetti by Filippos Fragkogiannis