Linear Heteroassociator \ Tutorials \ Library \ CroftSoft

Linear Heteroassociator

2005-02-16

This was an excercise that I worked through with the help of Dr. Richard M. Golden in his course Neural Net Mathematics.

Identities

Vector Norm Squared to Dot Product

{‖ x ‖}^{2} = x^{T} * x

Vector Derivative

\frac{\partial (x^{T} * x)}{\partial x} = 2 * x^{T}

Linear Heteroassociator For this exercise, we use a simple neural network with 3 inputs and 2 outputs. The network will be trained with 4 stimulus patterns. Linear Heteroassociator

r_{i 2x1} = W_{2x3} * s_{i 3x1}

There are 4 stimulus patterns.

n = 4

The 4 3x1 stimulus vectors make a 3x4 matrix.

S_{3x4} = {[\begin{matrix} s_{1} & s_{2} & s_{3} & s_{4} \end{matrix}]}_{3x4}

The 4 2x1 response vectors make a 2x4 matrix.

R_{2x4} = {[\begin{matrix} r_{1} & r_{2} & r_{3} & r_{4} \end{matrix}]}_{2x4}

The combined response matrix.

R_{2x4} = W_{2x3} * S_{3x4}

Objective Function To train the weight matrix, the objective function is to be minimized. For a given stimulus pattern (si), the difference (di) between the output desired (oi) and the actual response (ri).

d_{i} = o_{i} - r_{i}

Minimize the average square error.

l (W) = n^{-1} Σ_{i = 1}^{n} {‖ o_{i} - r_{i} ‖}^{2}

Substitute di.

l (W) = n^{-1} Σ_{i = 1}^{n} {‖ d_{i} ‖}^{2}

Use the dot product identity.

l (W) = n^{-1} Σ_{i = 1}^{n} ({d_{i}}^{T} * d_{i})

The 4 2x1 difference vectors form a 2x4 matrix.

D_{2x4} = {[\begin{matrix} d_{1} & d_{2} & d_{3} & d_{4} \end{matrix}]}_{2x4}

Convert the matrix to a vector by stacking columns.

d_{8x1} = vec (D)

Replace the sum of 4 dot products with one big dot product.

l (W) = n^{-1} * d_{1x8}^{T} * d_{8x1}

Transpose the weight matrix and convert it to a vector.

w_{6x1} = vec (W^{T})

Rewrite the objective function as a scalar function of a vector.

l (w) = n^{-1} * d_{1x8}^{T} * d_{8x1}

Gradient Descent Imagine a blind man in a land with many hills and valleys. He wants to get to the lowest point in the area. With each step, he uses his staff to tap around himself to determine the slope of the land at his current position. He then takes a step downward. He eventually reaches the bottom of a valley. The gradient descent weight update rule with learning rate α.

w_{t + 1} = w_{t} - α * \frac{\partial l (w)}{\partial w}

Chain Rule

{[\frac{\partial l (w)}{\partial w}]}_{1x6} = {[\frac{\partial l (d)}{\partial d}]}_{1x8} * {[\frac{\partial d (r)}{\partial r}]}_{8x8} * {[\frac{\partial r (w)}{\partial w}]}_{8x6}

First Term

\frac{\partial l (d)}{\partial d} = \frac{\partial (n^{-1} * d_{1x8}^{T} * d_{8x1})}{\partial d}

Pull the constant out of the derivative.

\frac{\partial l (d)}{\partial d} = n^{-1} * \frac{\partial (d_{1x8}^{T} * d_{8x1})}{\partial d}

Use the derivative identity.

\frac{\partial l (d)}{\partial d} = n^{-1} * (2 * d^{T})

Second Term

\frac{\partial d (r)}{\partial r} = \frac{\partial (o - r)}{\partial r} = - I_{8x8}

Response to a Single Stimulus Pattern

r_{i} = [\begin{matrix} r_{i 1} \\ r_{i 2} \end{matrix}] = [\begin{matrix} w_{1}^{T} * s_{i} \\ w_{2}^{T} * s_{i} \end{matrix}] = [\begin{matrix} w_{11} * s_{i1} + w_{12} * s_{i2} + w_{13} * s_{i3} \\ w_{21} * s_{i1} + w_{22} * s_{i2} + w_{23} * s_{i3} \end{matrix}]

Combined Response Vector as a Function of the Weights

r = [\begin{matrix} r_{11} \\ r_{12} \\ r_{21} \\ r_{22} \\ r_{31} \\ r_{32} \\ r_{41} \\ r_{42} \end{matrix}] = [\begin{matrix} w_{1}^{T} * s_{1} \\ w_{2}^{T} * s_{1} \\ w_{1}^{T} * s_{2} \\ w_{2}^{T} * s_{2} \\ w_{1}^{T} * s_{3} \\ w_{2}^{T} * s_{3} \\ w_{1}^{T} * s_{4} \\ w_{2}^{T} * s_{4} \end{matrix}] = [\begin{matrix} w_{11} * s_{11} + w_{12} * s_{12} + w_{13} * s_{13} \\ w_{21} * s_{11} + w_{22} * s_{12} + w_{23} * s_{13} \\ w_{11} * s_{21} + w_{12} * s_{22} + w_{13} * s_{23} \\ w_{21} * s_{21} + w_{22} * s_{22} + w_{23} * s_{23} \\ w_{11} * s_{31} + w_{12} * s_{32} + w_{13} * s_{33} \\ w_{21} * s_{31} + w_{22} * s_{32} + w_{23} * s_{33} \\ w_{11} * s_{41} + w_{12} * s_{42} + w_{13} * s_{43} \\ w_{21} * s_{41} + w_{22} * s_{42} + w_{23} * s_{43} \end{matrix}]

Third Term

\frac{\partial r (w)}{\partial w} = \frac{\partial {[\begin{matrix} r_{11} & r_{12} & r_{21} & r_{22} & r_{31} & r_{32} & r_{41} & r_{42} \end{matrix}]}^{T}}{\partial {[\begin{matrix} w_{11} & w_{12} & w_{13} & w_{21} & w_{22} & w_{23} \end{matrix}]}^{T}}

Third Term as an 8x6 Matrix

\frac{\partial r (w)}{\partial w} = {[\begin{matrix} s_{11} & s_{12} & s_{13} & 0 & 0 & 0 \\ 0 & 0 & 0 & s_{11} & s_{12} & s_{13} \\ s_{21} & s_{22} & s_{23} & 0 & 0 & 0 \\ 0 & 0 & 0 & s_{21} & s_{22} & s_{23} \\ s_{31} & s_{32} & s_{33} & 0 & 0 & 0 \\ 0 & 0 & 0 & s_{31} & s_{32} & s_{33} \\ s_{41} & s_{42} & s_{43} & 0 & 0 & 0 \\ 0 & 0 & 0 & s_{41} & s_{42} & s_{43} \end{matrix}]}_{8x6}

Third Term as a Matrix of Vectors

\frac{\partial r (w)}{\partial w} = {[\begin{matrix} s_{1 1x3}^{T} & 0_{1x3}^{T} \\ 0_{1x3}^{T} & s_{1 1x3}^{T} \\ s_{2 1x3}^{T} & 0_{1x3}^{T} \\ 0_{1x3}^{T} & s_{2 1x3}^{T} \\ s_{3 1x3}^{T} & 0_{1x3}^{T} \\ 0_{1x3}^{T} & s_{3 1x3}^{T} \\ s_{4 1x3}^{T} & 0_{1x3}^{T} \\ 0_{1x3}^{T} & s_{4 1x3}^{T} \end{matrix}]}_{8x6}

Combine the three terms.

\frac{\partial l (w)}{\partial w} = {[n^{-1} * (2 * d^{T})]}_{1x8} * [- I_{8x8}] * {[\begin{matrix} s_{1 1x3}^{T} & 0_{1x3}^{T} \\ 0_{1x3}^{T} & s_{1 1x3}^{T} \\ s_{2 1x3}^{T} & 0_{1x3}^{T} \\ 0_{1x3}^{T} & s_{2 1x3}^{T} \\ s_{3 1x3}^{T} & 0_{1x3}^{T} \\ 0_{1x3}^{T} & s_{3 1x3}^{T} \\ s_{4 1x3}^{T} & 0_{1x3}^{T} \\ 0_{1x3}^{T} & s_{4 1x3}^{T} \end{matrix}]}_{8x6}

Move the constants and drop the identity matrix.

{[\frac{\partial l (w)}{\partial w}]}_{1x6} = -2 * n^{-1} * d_{1x8}^{T} * {[\begin{matrix} s_{1 1x3}^{T} & 0_{1x3}^{T} \\ 0_{1x3}^{T} & s_{1 1x3}^{T} \\ s_{2 1x3}^{T} & 0_{1x3}^{T} \\ 0_{1x3}^{T} & s_{2 1x3}^{T} \\ s_{3 1x3}^{T} & 0_{1x3}^{T} \\ 0_{1x3}^{T} & s_{3 1x3}^{T} \\ s_{4 1x3}^{T} & 0_{1x3}^{T} \\ 0_{1x3}^{T} & s_{4 1x3}^{T} \end{matrix}]}_{8x6}

The Weight Update Rule

w_{t + 1} = w_{t} + 2 * α * n^{-1} * d_{1x8}^{T} * {[\begin{matrix} s_{1 1x3}^{T} & 0_{1x3}^{T} \\ 0_{1x3}^{T} & s_{1 1x3}^{T} \\ s_{2 1x3}^{T} & 0_{1x3}^{T} \\ 0_{1x3}^{T} & s_{2 1x3}^{T} \\ s_{3 1x3}^{T} & 0_{1x3}^{T} \\ 0_{1x3}^{T} & s_{3 1x3}^{T} \\ s_{4 1x3}^{T} & 0_{1x3}^{T} \\ 0_{1x3}^{T} & s_{4 1x3}^{T} \end{matrix}]}_{8x6}

Linear Heteroassociator

Identities

Links