Search the FAQ Archives

3 - A - B - C - D - E - F - G - H - I - J - K - L - M
N - O - P - Q - R - S - T - U - V - W - X - Y - Z - Internet FAQ Archives FAQ, Part 7 of 7: Hardware
Section - How to learn an inverse of a function?

( Part1 - Part2 - Part3 - Part4 - Part5 - Part6 - Part7 - Single Page )
[ Usenet FAQs | Web FAQs | Documents | RFC Index | Business Photos and Profiles ]

Top Document: FAQ, Part 7 of 7: Hardware
Previous Document: How to forecast time series (temporal sequences)?
Next Document: How to get invariant recognition of images under
See reader questions & answers on this topic! - Help others by sharing your knowledge

Ordinarily, NNs learn a function Y = f(X), where Y is a vector of
outputs, X is a vector of inputs, and f() is the function to be learned.
Sometimes, however, you may want to learn an inverse of a function f(),
that is, given Y, you want to be able to find an X such that Y = f(X).
In general, there may be many different Xs that satisfy the equation Y =

For example, in robotics (DeMers and Kreutz-Delgado, 1996, 1997), X might
describe the positions of the joints in a robot's arm, while Y would
describe the location of the robot's hand. There are simple formulas to
compute the location of the hand given the positions of the joints, called
the "forward kinematics" problem. But there is no simple formula for the
"inverse kinematics" problem to compute positions of the joints that yield a
given location for the hand. Furthermore, if the arm has several joints,
there will usually be many different positions of the joints that yield the
same location of the hand, so the forward kinematics function is many-to-one
and has no unique inverse. Picking any X such that Y = f(X) is OK if
the only aim is to position the hand at Y. However if the aim is to
generate a series of points to move the hand through an arc this may be
insufficient. In this case the series of Xs need to be in the same "branch"
of the function space. Care must be taken to avoid solutions that yield
inefficient or impossible movements of the arm. 

As another example, consider an industrial process in which X represents
settings of control variables imposed by an operator, and Y represents
measurements of the product of the industrial process. The function Y =
f(X) can be learned by a NN using conventional training methods. But the
goal of the analysis may be to find control settings X that yield a product
with specified measurements Y, in which case an inverse of f(X) is
required. In industrial applications, financial considerations are
important, so not just any setting X that yields the desired result Y may
be acceptable. Perhaps a function can be specified that gives the cost of X
resulting from energy consumption, raw materials, etc., in which case you
would want to find the X that minimizes the cost function while satisfying
the equation Y = f(X). 

The obvious way to try to learn an inverse function is to generate a set of
training data from a given forward function, but designate Y as the input
and X as the output when training the network. Using a least-squares error
function, this approach will fail if f() is many-to-one. The problem is
that for an input Y, the net will not learn any single X such that Y =
f(X), but will instead learn the arithmetic mean of all the Xs in the
training set that satisfy the equation (Bishop, 1995, pp. 207-208). One
solution to this difficulty is to construct a network that learns a mixture
approximation to the conditional distribution of X given Y (Bishop, 1995,
pp. 212-221). However, the mixture method will not work well in general for
an X vector that is more than one-dimensional, such as Y = X_1^2 +
X_2^2, since the number of mixture components required may increase
exponentially with the dimensionality of X. And you are still left with the
problem of extracting a single output vector from the mixture distribution,
which is nontrivial if the mixture components overlap considerably. Another
solution is to use a highly robust error function, such as a redescending
M-estimator, that learns a single mode of the conditional distribution
instead of learning the mean (Huber, 1981; Rohwer and van der Rest 1996).
Additional regularization terms or constraints may be required to persuade
the network to choose appropriately among several modes, and there may be
severe problems with local optima. 

Another approach is to train a network to learn the forward mapping f()
and then numerically invert the function. Finding X such that Y = f(X)
is simply a matter of solving a nonlinear system of equations, for which
many algorithms can be found in the numerical analysis literature (Dennis
and Schnabel 1983). One way to solve nonlinear equations is turn the problem
into an optimization problem by minimizing sum(Y_i-f(X_i))^2. This
method fits in nicely with the usual gradient-descent methods for training
NNs (Kindermann and Linden 1990). Since the nonlinear equations will
generally have multiple solutions, there may be severe problems with local
optima, especially if some solutions are considered more desirable than
others. You can deal with multiple solutions by inventing some objective
function that measures the goodness of different solutions, and optimizing
this objective function under the nonlinear constraint Y = f(X) using
any of numerous algorithms for nonlinear programming (NLP; see Bertsekas,
1995, and other references under "What are conjugate gradients,
Levenberg-Marquardt, etc.?") The power and flexibility of the nonlinear
programming approach are offset by possibly high computational demands. 

If the forward mapping f() is obtained by training a network, there will
generally be some error in the network's outputs. The magnitude of this
error can be difficult to estimate. The process of inverting a network can
propagate this error, so the results should be checked carefully for
validity and numerical stability. Some training methods can produce not just
a point output but also a prediction interval (Bishop, 1995; White, 1992).
You can take advantage of prediction intervals when inverting a network by
using NLP methods. For example, you could try to find an X that minimizes
the width of the prediction interval under the constraint that the equation 
Y = f(X) is satisfied. Or instead of requiring Y = f(X) be satisfied
exactly, you could try to find an X such that the prediction interval is
contained within some specified interval while minimizing some cost

For more mathematics concerning the inverse-function problem, as well as
some interesting methods involving self-organizing maps, see DeMers and
Kreutz-Delgado (1996, 1997). For NNs that are relatively easy to invert, see
the Adaptive Logic Networks described in the software sections of the FAQ. 


   Bertsekas, D. P. (1995), Nonlinear Programming, Belmont, MA: Athena

   Bishop, C.M. (1995), Neural Networks for Pattern Recognition, Oxford:
   Oxford University Press. 

   DeMers, D., and Kreutz-Delgado, K. (1996), "Canonical Parameterization of
   Excess motor degrees of freedom with self organizing maps", IEEE Trans
   Neural Networks, 7, 43-55. 

   DeMers, D., and Kreutz-Delgado, K. (1997), "Inverse kinematics of
   dextrous manipulators," in Omidvar, O., and van der Smagt, P., (eds.) 
   Neural Systems for Robotics, San Diego: Academic Press, pp. 75-116. 

   Dennis, J.E. and Schnabel, R.B. (1983) Numerical Methods for
   Unconstrained Optimization and Nonlinear Equations, Prentice-Hall 

   Huber, P.J. (1981), Robust Statistics, NY: Wiley. 

   Kindermann, J., and Linden, A. (1990), "Inversion of Neural Networks by
   Gradient Descent," Parallel Computing, 14, 277-286,

   Rohwer, R., and van der Rest, J.C. (1996), "Minimum description length,
   regularization, and multimodal data," Neural Computation, 8, 595-609. 

   White, H. (1992), "Nonparametric Estimation of Conditional Quantiles
   Using Neural Networks," in Page, C. and Le Page, R. (eds.), Proceedings
   of the 23rd Sympsium on the Interface: Computing Science and Statistics,
   Alexandria, VA: American Statistical Association, pp. 190-199. 

User Contributions:

Comment about this article, ask questions, or add new information about this topic:


Top Document: FAQ, Part 7 of 7: Hardware
Previous Document: How to forecast time series (temporal sequences)?
Next Document: How to get invariant recognition of images under

Part1 - Part2 - Part3 - Part4 - Part5 - Part6 - Part7 - Single Page

[ Usenet FAQs | Web FAQs | Documents | RFC Index ]

Send corrections/additions to the FAQ Maintainer: (Warren Sarle)

Last Update March 27 2014 @ 02:11 PM