Linear Heteroassociator

David Wallace Croft

2005-02-16

This was an excercise that I worked through with the help of Dr. Richard M. Golden in his course Neural Net Mathematics.

Identities

Vector Norm Squared to Dot Product x 2 = x T * x Vector Derivative ( xT * x ) x = 2 * x T Linear Heteroassociator For this exercise, we use a simple neural network with 3 inputs and 2 outputs. The network will be trained with 4 stimulus patterns. Linear Heteroassociator ri2x1 = W2x3 * si3x1 There are 4 stimulus patterns. n=4 The 4 3x1 stimulus vectors make a 3x4 matrix. S3x4 = [ s1 s2 s3 s4 ] 3x4 The 4 2x1 response vectors make a 2x4 matrix. R2x4 = [ r1 r2 r3 r4 ] 2x4 The combined response matrix. R2x4 = W2x3 * S3x4 Objective Function To train the weight matrix, the objective function is to be minimized. For a given stimulus pattern (si), the difference (di) between the output desired (oi) and the actual response (ri). di = oi - ri Minimize the average square error. l W = n -1 Σ i = 1 n o i - r i 2 Substitute di. l W = n -1 Σ i = 1 n d i 2 Use the dot product identity. l W = n -1 Σ i = 1 n ( di T * di ) The 4 2x1 difference vectors form a 2x4 matrix. D2x4 = [ d1 d2 d3 d4 ] 2x4 Convert the matrix to a vector by stacking columns. d8x1 = vec(D) Replace the sum of 4 dot products with one big dot product. l W = n -1 * d 1x8 T * d8x1 Transpose the weight matrix and convert it to a vector. w6x1 = vec(WT) Rewrite the objective function as a scalar function of a vector. l w = n -1 * d 1x8 T * d8x1 Gradient Descent Imagine a blind man in a land with many hills and valleys. He wants to get to the lowest point in the area. With each step, he uses his staff to tap around himself to determine the slope of the land at his current position. He then takes a step downward. He eventually reaches the bottom of a valley. The gradient descent weight update rule with learning rate α. wt+1 = wt - α * l(w) w Chain Rule [ l(w) w ] 1x6 = [ l(d) d ] 1x8 * [ d(r) r ] 8x8 * [ r(w) w ] 8x6 First Term l(d) d = ( n -1 * d 1x8 T * d8x1 ) d Pull the constant out of the derivative. l(d) d = n -1 * ( d 1x8 T * d8x1 ) d Use the derivative identity. l(d) d = n -1 * ( 2 * dT ) Second Term d (r) r = ( o - r ) r = - I8x8 Response to a Single Stimulus Pattern r i = [ r i1 r i2 ] = [ w1T * si w2T * si ] = [ w11 * si1 + w12 * si2 + w13 * si3 w21 * si1 + w22 * si2 + w23 * si3 ] Combined Response Vector as a Function of the Weights r = [ r 11 r 12 r 21 r 22 r 31 r 32 r 41 r 42 ] = [ w1T * s1 w2T * s1 w1T * s2 w2T * s2 w1T * s3 w2T * s3 w1T * s4 w2T * s4 ] = [ w11 * s11 + w12 * s12 + w13 * s13 w21 * s11 + w22 * s12 + w23 * s13 w11 * s21 + w12 * s22 + w13 * s23 w21 * s21 + w22 * s22 + w23 * s23 w11 * s31 + w12 * s32 + w13 * s33 w21 * s31 + w22 * s32 + w23 * s33 w11 * s41 + w12 * s42 + w13 * s43 w21 * s41 + w22 * s42 + w23 * s43 ] Third Term r (w) w = [ r11 r12 r21 r22 r31 r32 r41 r42 ] T [ w11 w12 w13 w21 w22 w23 ] T Third Term as an 8x6 Matrix r (w) w = [ s11 s12 s13 0 0 0 0 0 0 s11 s12 s13 s21 s22 s23 0 0 0 0 0 0 s21 s22 s23 s31 s32 s33 0 0 0 0 0 0 s31 s32 s33 s41 s42 s43 0 0 0 0 0 0 s41 s42 s43 ] 8x6 Third Term as a Matrix of Vectors r (w) w = [ s 11x3T 01x3T 01x3T s 11x3T s 21x3T 01x3T 01x3T s 21x3T s 31x3T 01x3T 01x3T s 31x3T s 41x3T 01x3T 01x3T s 41x3T ] 8x6 Combine the three terms. l(w) w = [ n -1 * ( 2 * dT ) ] 1x8 * [ - I8x8 ] * [ s 11x3T 01x3T 01x3T s 11x3T s 21x3T 01x3T 01x3T s 21x3T s 31x3T 01x3T 01x3T s 31x3T s 41x3T 01x3T 01x3T s 41x3T ] 8x6 Move the constants and drop the identity matrix. [ l(w) w ] 1x6 = -2 * n -1 * d1x8T * [ s 11x3T 01x3T 01x3T s 11x3T s 21x3T 01x3T 01x3T s 21x3T s 31x3T 01x3T 01x3T s 31x3T s 41x3T 01x3T 01x3T s 41x3T ] 8x6 The Weight Update Rule wt+1 = wt + 2 * α * n -1 * d1x8T * [ s 11x3T 01x3T 01x3T s 11x3T s 21x3T 01x3T 01x3T s 21x3T s 31x3T 01x3T 01x3T s 31x3T s 41x3T 01x3T 01x3T s 41x3T ] 8x6

Links