Top Document: comp.ai.neuralnets FAQ, Part 1 of 7: Introduction Previous Document: Who is concerned with NNs? Next Document: How many kinds of Kohonen networks exist? See reader questions & answers on this topic!  Help others by sharing your knowledge There are many many kinds of NNs by now. Nobody knows exactly how many. New ones (or at least variations of old ones) are invented every week. Below is a collection of some of the most well known methods, not claiming to be complete. The two main kinds of learning algorithms are supervised and unsupervised. o In supervised learning, the correct results (target values, desired outputs) are known and are given to the NN during training so that the NN can adjust its weights to try match its outputs to the target values. After training, the NN is tested by giving it only input values, not target values, and seeing how close it comes to outputting the correct target values. o In unsupervised learning, the NN is not provided with the correct results during training. Unsupervised NNs usually perform some kind of data compression, such as dimensionality reduction or clustering. See "What does unsupervised learning learn?" The distinction between supervised and unsupervised methods is not always clearcut. An unsupervised method can learn a summary of a probability distribution, then that summarized distribution can be used to make predictions. Furthermore, supervised methods come in two subvarieties: autoassociative and heteroassociative. In autoassociative learning, the target values are the same as the inputs, whereas in heteroassociative learning, the targets are generally different from the inputs. Many unsupervised methods are equivalent to autoassociative supervised methods. For more details, see "What does unsupervised learning learn?" Two major kinds of network topology are feedforward and feedback. o In a feedforward NN, the connections between units do not form cycles. Feedforward NNs usually produce a response to an input quickly. Most feedforward NNs can be trained using a wide variety of efficient conventional numerical methods (e.g. see "What are conjugate gradients, LevenbergMarquardt, etc.?") in addition to algorithms invented by NN reserachers. o In a feedback or recurrent NN, there are cycles in the connections. In some feedback NNs, each time an input is presented, the NN must iterate for a potentially long time before it produces a response. Feedback NNs are usually more difficult to train than feedforward NNs. Some kinds of NNs (such as those with winnertakeall units) can be implemented as either feedforward or feedback networks. NNs also differ in the kinds of data they accept. Two major kinds of data are categorical and quantitative. o Categorical variables take only a finite (technically, countable) number of possible values, and there are usually several or more cases falling into each category. Categorical variables may have symbolic values (e.g., "male" and "female", or "red", "green" and "blue") that must be encoded into numbers before being given to the network (see "How should categories be encoded?") Both supervised learning with categorical target values and unsupervised learning with categorical outputs are called "classification." o Quantitative variables are numerical measurements of some attribute, such as length in meters. The measurements must be made in such a way that at least some arithmetic relations among the measurements reflect analogous relations among the attributes of the objects that are measured. For more information on measurement theory, see the Measurement Theory FAQ at ftp://ftp.sas.com/pub/neural/measurement.html. Supervised learning with quantitative target values is called "regression." Some variables can be treated as either categorical or quantitative, such as number of children or any binary variable. Most regression algorithms can also be used for supervised classification by encoding categorical target values as 0/1 binary variables and using those binary variables as target values for the regression algorithm. The outputs of the network are posterior probabilities when any of the most common training methods are used. Here are some wellknown kinds of NNs: 1. Supervised 1. Feedforward o Linear o Hebbian  Hebb (1949), Fausett (1994) o Perceptron  Rosenblatt (1958), Minsky and Papert (1969/1988), Fausett (1994) o Adaline  Widrow and Hoff (1960), Fausett (1994) o Higher Order  Bishop (1995) o Functional Link  Pao (1989) o MLP: Multilayer perceptron  Bishop (1995), Reed and Marks (1999), Fausett (1994) o Backprop  Rumelhart, Hinton, and Williams (1986) o Cascade Correlation  Fahlman and Lebiere (1990), Fausett (1994) o Quickprop  Fahlman (1989) o RPROP  Riedmiller and Braun (1993) o RBF networks  Bishop (1995), Moody and Darken (1989), Orr (1996) o OLS: Orthogonal Least Squares  Chen, Cowan and Grant (1991) o CMAC: Cerebellar Model Articulation Controller  Albus (1975), Brown and Harris (1994) o Classification only o LVQ: Learning Vector Quantization  Kohonen (1988), Fausett (1994) o PNN: Probabilistic Neural Network  Specht (1990), Masters (1993), Hand (1982), Fausett (1994) o Regression only o GNN: General Regression Neural Network  Specht (1991), Nadaraya (1964), Watson (1964) 2. Feedback  Hertz, Krogh, and Palmer (1991), Medsker and Jain (2000) o BAM: Bidirectional Associative Memory  Kosko (1992), Fausett (1994) o Boltzman Machine  Ackley et al. (1985), Fausett (1994) o Recurrent time series o Backpropagation through time  Werbos (1990) o Elman  Elman (1990) o FIR: Finite Impulse Response  Wan (1990) o Jordan  Jordan (1986) o Realtime recurrent network  Williams and Zipser (1989) o Recurrent backpropagation  Pineda (1989), Fausett (1994) o TDNN: Time Delay NN  Lang, Waibel and Hinton (1990) 3. Competitive o ARTMAP  Carpenter, Grossberg and Reynolds (1991) o Fuzzy ARTMAP  Carpenter, Grossberg, Markuzon, Reynolds and Rosen (1992), Kasuba (1993) o Gaussian ARTMAP  Williamson (1995) o Counterpropagation  HechtNielsen (1987; 1988; 1990), Fausett (1994) o Neocognitron  Fukushima, Miyake, and Ito (1983), Fukushima, (1988), Fausett (1994) 2. Unsupervised  Hertz, Krogh, and Palmer (1991) 1. Competitive o Vector Quantization o Grossberg  Grossberg (1976) o Kohonen  Kohonen (1984) o Conscience  Desieno (1988) o SelfOrganizing Map o Kohonen  Kohonen (1995), Fausett (1994) o GTM:  Bishop, Svensén and Williams (1997) o Local Linear  Mulier and Cherkassky (1995) o Adaptive resonance theory o ART 1  Carpenter and Grossberg (1987a), Moore (1988), Fausett (1994) o ART 2  Carpenter and Grossberg (1987b), Fausett (1994) o ART 2A  Carpenter, Grossberg and Rosen (1991a) o ART 3  Carpenter and Grossberg (1990) o Fuzzy ART  Carpenter, Grossberg and Rosen (1991b) o DCL: Differential Competitive Learning  Kosko (1992) 2. Dimension Reduction  Diamantaras and Kung (1996) o Hebbian  Hebb (1949), Fausett (1994) o Oja  Oja (1989) o Sanger  Sanger (1989) o Differential Hebbian  Kosko (1992) 3. Autoassociation o Linear autoassociator  Anderson et al. (1977), Fausett (1994) o BSB: Brain State in a Box  Anderson et al. (1977), Fausett (1994) o Hopfield  Hopfield (1982), Fausett (1994) 3. Nonlearning 1. Hopfield  Hertz, Krogh, and Palmer (1991) 2. various networks for optimization  Cichocki and Unbehauen (1993) References: Ackley, D.H., Hinton, G.F., and Sejnowski, T.J. (1985), "A learning algorithm for Boltzman machines," Cognitive Science, 9, 147169. Albus, J.S (1975), "New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)," Transactions of the ASME Journal of Dynamic Systems, Measurement, and Control, September 1975, 22027. Anderson, J.A., and Rosenfeld, E., eds. (1988), Neurocomputing: Foundatons of Research, Cambridge, MA: The MIT Press. Anderson, J.A., Silverstein, J.W., Ritz, S.A., and Jones, R.S. (1977) "Distinctive features, categorical perception, and probability learning: Some applications of a neural model," Psychological Rveiew, 84, 413451. Reprinted in Anderson and Rosenfeld (1988). Bishop, C.M. (1995), Neural Networks for Pattern Recognition, Oxford: Oxford University Press. Bishop, C.M., Svensén, M., and Williams, C.K.I (1997), "GTM: A principled alternative to the selforganizing map," in Mozer, M.C., Jordan, M.I., and Petsche, T., (eds.) Advances in Neural Information Processing Systems 9, Cambrideg, MA: The MIT Press, pp. 354360. Also see http://www.ncrg.aston.ac.uk/GTM/ Brown, M., and Harris, C. (1994), Neurofuzzy Adaptive Modelling and Control, NY: Prentice Hall. Carpenter, G.A., Grossberg, S. (1987a), "A massively parallel architecture for a selforganizing neural pattern recognition machine," Computer Vision, Graphics, and Image Processing, 37, 54115. Carpenter, G.A., Grossberg, S. (1987b), "ART 2: Selforganization of stable category recognition codes for analog input patterns," Applied Optics, 26, 49194930. Carpenter, G.A., Grossberg, S. (1990), "ART 3: Hierarchical search using chemical transmitters in selforganizing pattern recognition architectures. Neural Networks, 3, 129152. Carpenter, G.A., Grossberg, S., Markuzon, N., Reynolds, J.H., and Rosen, D.B. (1992), "Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps," IEEE Transactions on Neural Networks, 3, 698713 Carpenter, G.A., Grossberg, S., Reynolds, J.H. (1991), "ARTMAP: Supervised realtime learning and classification of nonstationary data by a selforganizing neural network," Neural Networks, 4, 565588. Carpenter, G.A., Grossberg, S., Rosen, D.B. (1991a), "ART 2A: An adaptive resonance algorithm for rapid category learning and recognition," Neural Networks, 4, 493504. Carpenter, G.A., Grossberg, S., Rosen, D.B. (1991b), "Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system," Neural Networks, 4, 759771. Chen, S., Cowan, C.F.N., and Grant, P.M. (1991), "Orthogonal least squares learning for radial basis function networks," IEEE Transactions on Neural Networks, 2, 302309. Cichocki, A. and Unbehauen, R. (1993). Neural Networks for Optimization and Signal Processing. NY: John Wiley & Sons, ISBN 0471930105. Desieno, D. (1988), "Adding a conscience to competitive learning," Proc. Int. Conf. on Neural Networks, I, 117124, IEEE Press. Diamantaras, K.I., and Kung, S.Y. (1996) Principal Component Neural Networks: Theory and Applications, NY: Wiley. Elman, J.L. (1990), "Finding structure in time," Cognitive Science, 14, 179211. Fahlman, S.E. (1989), "FasterLearning Variations on BackPropagation: An Empirical Study", in Touretzky, D., Hinton, G, and Sejnowski, T., eds., Proceedings of the 1988 Connectionist Models Summer School, Morgan Kaufmann, 3851. Fahlman, S.E., and Lebiere, C. (1990), "The CascadeCorrelation Learning Architecture", in Touretzky, D. S. (ed.), Advances in Neural Information Processing Systems 2,, Los Altos, CA: Morgan Kaufmann Publishers, pp. 524532. Fausett, L. (1994), Fundamentals of Neural Networks, Englewood Cliffs, NJ: Prentice Hall. Fukushima, K., Miyake, S., and Ito, T. (1983), "Neocognitron: A neural network model for a mechanism of visual pattern recognition," IEEE Transactions on Systems, Man, and Cybernetics, 13, 826834. Fukushima, K. (1988), "Neocognitron: A hierarchical neural network capable of visual pattern recognition," Neural Networks, 1, 119130. Grossberg, S. (1976), "Adaptive pattern classification and universal recoding: I. Parallel development and coding of neural feature detectors," Biological Cybernetics, 23, 121134 Hand, D.J. (1982) Kernel Discriminant Analysis, Research Studies Press. Hebb, D.O. (1949), The Organization of Behavior, NY: John Wiley & Sons. HechtNielsen, R. (1987), "Counterpropagation networks," Applied Optics, 26, 49794984. HechtNielsen, R. (1988), "Applications of counterpropagation networks," Neural Networks, 1, 131139. HechtNielsen, R. (1990), Neurocomputing, Reading, MA: AddisonWesley. Hertz, J., Krogh, A., and Palmer, R. (1991). Introduction to the Theory of Neural Computation. AddisonWesley: Redwood City, California. Hopfield, J.J. (1982), "Neural networks and physical systems with emergent collective computational abilities," Proceedings of the National Academy of Sciences, 79, 25542558. Reprinted in Anderson and Rosenfeld (1988). Jordan, M. I. (1986), "Attractor dynamics and parallelism in a connectionist sequential machine," In Proceedings of the Eighth Annual conference of the Cognitive Science Society, pages 531546. Lawrence Erlbaum. Kasuba, T. (1993), "Simplified Fuzzy ARTMAP," AI Expert, 8, 1825. Kohonen, T. (1984), SelfOrganization and Associative Memory, Berlin: Springer. Kohonen, T. (1988), "Learning Vector Quantization," Neural Networks, 1 (suppl 1), 303. Kohonen, T. (1995/1997), SelfOrganizing Maps, Berlin: SpringerVerlag. First edition was 1995, second edition 1997. See http://www.cis.hut.fi/nnrc/new_book.html for information on the second edition. Kosko, B.(1992), Neural Networks and Fuzzy Systems, Englewood Cliffs, N.J.: PrenticeHall. Lang, K. J., Waibel, A. H., and Hinton, G. (1990), "A timedelay neural network architecture for isolated word recognition," Neural Networks, 3, 2344. Masters, T. (1993). Practical Neural Network Recipes in C++, San Diego: Academic Press. Masters, T. (1995) Advanced Algorithms for Neural Networks: A C++ Sourcebook, NY: John Wiley and Sons, ISBN 0471105880 Medsker, L.R., and Jain, L.C., eds. (2000), Recurrent Neural Networks: Design and Applications, Boca Raton, FL: CRC Press, ISBN 0849371813. Minsky, M.L., and Papert, S.A. (1969/1988), Perceptrons, Cambridge, MA: The MIT Press (first edition, 1969; expanded edition, 1988). Moody, J. and Darken, C.J. (1989), "Fast learning in networks of locallytuned processing units," Neural Computation, 1, 281294. Moore, B. (1988), "ART 1 and Pattern Clustering," in Touretzky, D., Hinton, G. and Sejnowski, T., eds., Proceedings of the 1988 Connectionist Models Summer School, 174185, San Mateo, CA: Morgan Kaufmann. Mulier, F. and Cherkassky, V. (1995), "SelfOrganization as an Iterative Kernel Smoothing Process," Neural Computation, 7, 11651177. Nadaraya, E.A. (1964) "On estimating regression", Theory Probab. Applic. 10, 18690. Oja, E. (1989), "Neural networks, principal components, and subspaces," International Journal of Neural Systems, 1, 6168. Orr, M.J.L. (1996), "Introduction to radial basis function networks," http://www.anc.ed.ac.uk/~mjo/papers/intro.ps or http://www.anc.ed.ac.uk/~mjo/papers/intro.ps.gz Pao, Y. H. (1989), Adaptive Pattern Recognition and Neural Networks, Reading, MA: AddisonWesley Publishing Company, ISBN 0201125846. Pineda, F.J. (1989), "Recurrent backpropagation and the dynamical approach to neural computation," Neural Computation, 1, 161172. Reed, R.D., and Marks, R.J, II (1999), Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks, Cambridge, MA: The MIT Press, ISBN 0262181908. Riedmiller, M. and Braun, H. (1993), "A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm", Proceedings of the IEEE International Conference on Neural Networks 1993, San Francisco: IEEE. Rosenblatt, F. (1958), "The perceptron: A probabilistic model for information storage and organization in the brain., Psychological Review, 65, 386408. Rumelhart, D.E., Hinton, G.E., and Williams, R.J. (1986), "Learning internal representations by error propagation", in Rumelhart, D.E. and McClelland, J. L., eds. (1986), Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1, 318362, Cambridge, MA: The MIT Press. Sanger, T.D. (1989), "Optimal unsupervised learning in a singlelayer linear feedforward neural network," Neural Networks, 2, 459473. Specht, D.F. (1990) "Probabilistic neural networks," Neural Networks, 3, 110118. Specht, D.F. (1991) "A Generalized Regression Neural Network", IEEE Transactions on Neural Networks, 2, Nov. 1991, 568576. Wan, E.A. (1990), "Temporal backpropagation: An efficient algorithm for finite impulse response neural networks," in Proceedings of the 1990 Connectionist Models Summer School, Touretzky, D.S., Elman, J.L., Sejnowski, T.J., and Hinton, G.E., eds., San Mateo, CA: Morgan Kaufmann, pp. 131140. Watson, G.S. (1964) "Smooth regression analysis", Sankhy{\=a}, Series A, 26, 35972. Werbos, P.J. (1990), "Backpropagtion through time: What it is and how to do it," Proceedings of the IEEE, 78, 15501560. Widrow, B., and Hoff, M.E., Jr., (1960), "Adaptive switching circuits," IRE WESCON Convention Record. part 4, pp. 96104. Reprinted in Anderson and Rosenfeld (1988). Williams, R.J., and Zipser, D., (1989), "A learning algorithm for continually running fully recurrent neurla networks," Neural Computation, 1, 270280. Williamson, J.R. (1995), "Gaussian ARTMAP: A neural network for fast incremental learning of noisy multidimensional maps," Technical Report CAS/CNS95003, Boston University, Center of Adaptive Systems and Department of Cognitive and Neural Systems. User Contributions:Comment about this article, ask questions, or add new information about this topic:Top Document: comp.ai.neuralnets FAQ, Part 1 of 7: Introduction Previous Document: Who is concerned with NNs? Next Document: How many kinds of Kohonen networks exist? Part1  Part2  Part3  Part4  Part5  Part6  Part7  Single Page [ Usenet FAQs  Web FAQs  Documents  RFC Index ] Send corrections/additions to the FAQ Maintainer: saswss@unx.sas.com (Warren Sarle)
Last Update March 27 2014 @ 02:11 PM
