M BUZZ CRAZE NEWS
// general

Gradient descent in $n$-dimensional space in the context of an RBF network

By Mia Morrison
$\begingroup$

I am trying to implement an algorithm to perform gradient descent in a $n$-dimensional space in the context of an RBF network. My network has 5 inputs and 1 output. It has the following Gaussian radial basis function: $$f(x_i)=\sum_{k=1}^nw_ke^{(-\lambda_k||x_i - \mu_k||^2)}$$

where:

  • $n$ is the number of centroids in the network
  • $w_k$ are the weights between the hidden layer and output layer for each centroid $n$
  • $\lambda_k$ are the function width parameters for the $k^{th}$ centroid
  • $x_i$ is the $i^{th}$ training data input (vector)
  • $\mu_k$ is the $k^{th}$ centroid (vector)

My goal is the find optimal values of $w_k$ and $\lambda_k$ that minimize the squared error. I plan on doing that iteratively by:

  1. Fix $\lambda_k$ and use pseudo-inverse to solve for $w_k$
  2. Fix $w_k$ and solve for $\lambda_k$ using gradient descent by minimizing the error w.r.t. $\lambda_k$
  3. Repeat steps 1 and 2 until convergence is achieved (or any other criteria).

I have already found the weights by fixing each $\lambda_k$ to 1 (randomly) and using the pseudo-inverse matrix. Now I need to find values for each $\lambda_k$ that minimize the squared error. From what I understand, I need to:

  1. Find the partial(w.r.t. $\lambda_k$) derivative of the error function: $$S=\sum_{i=1}^p(y_i - f(x_i))^2$$where $f(x_i)$ is the function above, $y_i$ is the actual output from the $i^{th}$ training data and $p$ is the number of entries in the training set.
  2. Equate that partial derivative to zero.
  3. Solve for all the $\lambda_k$.

I do not understand well how to perform gradient descent w.r.t. $\lambda_k$.

Can anyone help me clarify how to perform gradient descent in a $n$-dimensional space? I need to find a vector of gradients $\delta_k$ to apply to each $\lambda_k$, so that $\lambda_k := \lambda_k + \delta_k$

My current algorithm for one iteration of gradient descent:

for each item x in training data set feed forward x in the network and get the output for each centroid k λk = λk - μ*(gradient of the cost function) // μ is a constant for the learning rate end loop.
end loop.

where:

$\frac{\delta S}{\delta\lambda_k} = \sum_{i=1}^n2d_i^2(y_i - e^{-d_i^2\lambda_k})e^{-d_i^2\lambda_k} = \sum_{i=1}^n2d_i(y_i-f(x_i))f(x_i)$

where $d_i = ||x_i - \mu_k||$ = euclidean distance and $y_i$ = output from the training set, both known values.

Thank you very much!

$\endgroup$ 15 Reset to default

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy