Top Document: comp.ai.neuralnets FAQ, Part 7 of 7: Hardware Previous Document: How to forecast time series (temporal sequences)? Next Document: How to get invariant recognition of images under See reader questions & answers on this topic!  Help others by sharing your knowledge Ordinarily, NNs learn a function Y = f(X), where Y is a vector of outputs, X is a vector of inputs, and f() is the function to be learned. Sometimes, however, you may want to learn an inverse of a function f(), that is, given Y, you want to be able to find an X such that Y = f(X). In general, there may be many different Xs that satisfy the equation Y = f(X). For example, in robotics (DeMers and KreutzDelgado, 1996, 1997), X might describe the positions of the joints in a robot's arm, while Y would describe the location of the robot's hand. There are simple formulas to compute the location of the hand given the positions of the joints, called the "forward kinematics" problem. But there is no simple formula for the "inverse kinematics" problem to compute positions of the joints that yield a given location for the hand. Furthermore, if the arm has several joints, there will usually be many different positions of the joints that yield the same location of the hand, so the forward kinematics function is manytoone and has no unique inverse. Picking any X such that Y = f(X) is OK if the only aim is to position the hand at Y. However if the aim is to generate a series of points to move the hand through an arc this may be insufficient. In this case the series of Xs need to be in the same "branch" of the function space. Care must be taken to avoid solutions that yield inefficient or impossible movements of the arm. As another example, consider an industrial process in which X represents settings of control variables imposed by an operator, and Y represents measurements of the product of the industrial process. The function Y = f(X) can be learned by a NN using conventional training methods. But the goal of the analysis may be to find control settings X that yield a product with specified measurements Y, in which case an inverse of f(X) is required. In industrial applications, financial considerations are important, so not just any setting X that yields the desired result Y may be acceptable. Perhaps a function can be specified that gives the cost of X resulting from energy consumption, raw materials, etc., in which case you would want to find the X that minimizes the cost function while satisfying the equation Y = f(X). The obvious way to try to learn an inverse function is to generate a set of training data from a given forward function, but designate Y as the input and X as the output when training the network. Using a leastsquares error function, this approach will fail if f() is manytoone. The problem is that for an input Y, the net will not learn any single X such that Y = f(X), but will instead learn the arithmetic mean of all the Xs in the training set that satisfy the equation (Bishop, 1995, pp. 207208). One solution to this difficulty is to construct a network that learns a mixture approximation to the conditional distribution of X given Y (Bishop, 1995, pp. 212221). However, the mixture method will not work well in general for an X vector that is more than onedimensional, such as Y = X_1^2 + X_2^2, since the number of mixture components required may increase exponentially with the dimensionality of X. And you are still left with the problem of extracting a single output vector from the mixture distribution, which is nontrivial if the mixture components overlap considerably. Another solution is to use a highly robust error function, such as a redescending Mestimator, that learns a single mode of the conditional distribution instead of learning the mean (Huber, 1981; Rohwer and van der Rest 1996). Additional regularization terms or constraints may be required to persuade the network to choose appropriately among several modes, and there may be severe problems with local optima. Another approach is to train a network to learn the forward mapping f() and then numerically invert the function. Finding X such that Y = f(X) is simply a matter of solving a nonlinear system of equations, for which many algorithms can be found in the numerical analysis literature (Dennis and Schnabel 1983). One way to solve nonlinear equations is turn the problem into an optimization problem by minimizing sum(Y_if(X_i))^2. This method fits in nicely with the usual gradientdescent methods for training NNs (Kindermann and Linden 1990). Since the nonlinear equations will generally have multiple solutions, there may be severe problems with local optima, especially if some solutions are considered more desirable than others. You can deal with multiple solutions by inventing some objective function that measures the goodness of different solutions, and optimizing this objective function under the nonlinear constraint Y = f(X) using any of numerous algorithms for nonlinear programming (NLP; see Bertsekas, 1995, and other references under "What are conjugate gradients, LevenbergMarquardt, etc.?") The power and flexibility of the nonlinear programming approach are offset by possibly high computational demands. If the forward mapping f() is obtained by training a network, there will generally be some error in the network's outputs. The magnitude of this error can be difficult to estimate. The process of inverting a network can propagate this error, so the results should be checked carefully for validity and numerical stability. Some training methods can produce not just a point output but also a prediction interval (Bishop, 1995; White, 1992). You can take advantage of prediction intervals when inverting a network by using NLP methods. For example, you could try to find an X that minimizes the width of the prediction interval under the constraint that the equation Y = f(X) is satisfied. Or instead of requiring Y = f(X) be satisfied exactly, you could try to find an X such that the prediction interval is contained within some specified interval while minimizing some cost function. For more mathematics concerning the inversefunction problem, as well as some interesting methods involving selforganizing maps, see DeMers and KreutzDelgado (1996, 1997). For NNs that are relatively easy to invert, see the Adaptive Logic Networks described in the software sections of the FAQ. References: Bertsekas, D. P. (1995), Nonlinear Programming, Belmont, MA: Athena Scientific. Bishop, C.M. (1995), Neural Networks for Pattern Recognition, Oxford: Oxford University Press. DeMers, D., and KreutzDelgado, K. (1996), "Canonical Parameterization of Excess motor degrees of freedom with self organizing maps", IEEE Trans Neural Networks, 7, 4355. DeMers, D., and KreutzDelgado, K. (1997), "Inverse kinematics of dextrous manipulators," in Omidvar, O., and van der Smagt, P., (eds.) Neural Systems for Robotics, San Diego: Academic Press, pp. 75116. Dennis, J.E. and Schnabel, R.B. (1983) Numerical Methods for Unconstrained Optimization and Nonlinear Equations, PrenticeHall Huber, P.J. (1981), Robust Statistics, NY: Wiley. Kindermann, J., and Linden, A. (1990), "Inversion of Neural Networks by Gradient Descent," Parallel Computing, 14, 277286, ftp://icsi.Berkeley.EDU/pub/ai/linden/ Rohwer, R., and van der Rest, J.C. (1996), "Minimum description length, regularization, and multimodal data," Neural Computation, 8, 595609. White, H. (1992), "Nonparametric Estimation of Conditional Quantiles Using Neural Networks," in Page, C. and Le Page, R. (eds.), Proceedings of the 23rd Sympsium on the Interface: Computing Science and Statistics, Alexandria, VA: American Statistical Association, pp. 190199. User Contributions:1 Yasmin Aug 22, 2014 @ 9:09 am I'm writing a Matlab code for face recognition , feature extraction has been done using PCA and the classifier is a back Propagation Neural Network, i have a strange problem, that every time i run my code, it gives me a random different output, although i'm using the same parameters of BPNN (momentum, learning rate, no. of epochs, goal , ...etc,) and of course the same input ,,, what is the problem ? is there is any thing in the back propagation neural network that randomize the output ?? where is the problem i don't know ? Comment about this article, ask questions, or add new information about this topic:Top Document: comp.ai.neuralnets FAQ, Part 7 of 7: Hardware Previous Document: How to forecast time series (temporal sequences)? Next Document: How to get invariant recognition of images under Part1  Part2  Part3  Part4  Part5  Part6  Part7  Single Page [ Usenet FAQs  Web FAQs  Documents  RFC Index ] Send corrections/additions to the FAQ Maintainer: saswss@unx.sas.com (Warren Sarle)
Last Update March 27 2014 @ 02:11 PM
