Top Document: comp.ai.neuralnets FAQ, Part 4 of 7: Books, data, etc. Previous Document: News Headers Next Document: Journals and magazines about Neural Networks? See reader questions & answers on this topic!  Help others by sharing your knowledge The following search engines will search many bookstores for new and used books and return information on availability, price, and shipping charges: AddAll: http://www.addall.com/ Bookfinder: http://www.bookfinder.com/ Clicking on the author and title of most of the books listed in the "Best" and "Notable" sections will do a search using AddAll. There are many online bookstores, such as: Amazon: http://www.amazon.com/ Amazon, UK: http://www.amazon.co.uk/ Amazon, Germany: http://www.amazon.de/ Barnes & Noble: http://www.bn.com/ Bookpool: http://www.bookpool.com/ Borders: http://www.borders.com/ Fatbrain: http://www.fatbrain.com/ The neural networks reading group at the University of Illinois at UrbanaChampaign, the Artifical Neural Networks and Computational Brain Theory (ANNCBT) forum, has compiled a large number of book and paper reviews at http://anncbt.ai.uiuc.edu/, with an emphasis more on cognitive science rather than practical applications of NNs. The Best ++++++++ The best of the best  Bishop (1995) is clearly the single best book on artificial NNs. This book excels in organization and choice of material, and is a close runnerup to Ripley (1996) for accuracy. If you are new to the field, read it from cover to cover. If you have lots of experience with NNs, it's an excellent reference. If you don't know calculus, take a class. I hope a second edition comes out soon! For more information, see The best intermediate textbooks on NNs below. If you have questions on feedforward nets that aren't answered by Bishop, try Masters (1993) or Reed and Marks (1999) for practical issues or Ripley (1996) for theortical issues, all of which are reviewed below. The best popular introduction to NNs  Hinton, G.E. (1992), "How Neural Networks Learn from Experience", Scientific American, 267 (September), 144151 (page numbers are for the US edition). Author's Webpage: http://www.cs.utoronto.ca/DCS/People/Faculty/hinton.html (official) and http://www.cs.toronto.edu/~hinton (private) Journal Webpage: http://www.sciam.com/ Additional Information: Unfortunately that article is not available there. The best introductory book for business executives  Bigus, J.P. (1996), Data Mining with Neural Networks: Solving Business Problemsfrom Application Development to Decision Support, NY: McGrawHill, ISBN 0070057796, xvii+221 pages. The stereotypical business executive (SBE) does not want to know how or why NNs workhe (SBEs are usually male) just wants to make money. The SBE may know what an average or percentage is, but he is deathly afraid of "statistics". He understands profit and loss but does not want to waste his time learning things involving complicated math, such as highschool algebra. For further information on the SBE, see the "Dilbert" comic strip. Bigus has written an excellent introduction to NNs for the SBE. Bigus says (p. xv), "For business executives, managers, or computer professionals, this book provides a thorough introduction to neural network technology and the issues related to its application without getting bogged down in complex math or needless details. The reader will be able to identify common business problems that are amenable to the neural netwrk approach and will be sensitized to the issues that can affect successful completion of such applications." Bigus succeeds in explaining NNs at a practical, intuitive, and necessarily shallow level without formulasjust what the SBE needs. This book is far better than Caudill and Butler (1990), a popular but disastrous attempt to explain NNs without formulas. Chapter 1 introduces data mining and data warehousing, and sketches some applications thereof. Chapter 2 is the semiobligatory philosophicohistorical discussion of AI and NNs and is wellwritten, although the SBE in a hurry may want to skip it. Chapter 3 is a very useful discussion of data preparation. Chapter 4 describes a variety of NNs and what they are good for. Chapter 5 goes into practical issues of training and testing NNs. Chapters 6 and 7 explain how to use the results from NNs. Chapter 8 discusses intelligent agents. Chapters 9 through 12 contain case histories of NN applications, including market segmentation, realestate pricing, customer ranking, and sales forecasting. Bigus provides generally sound advice. He briefly discusses overfitting and overtraining without going into much detail, although I think his advice on p. 57 to have at least two training cases for each connection is somewhat lenient, even for noisefree data. I do not understand his claim on pp. 73 and 170 that RBF networks have advantages over backprop networks for nonstationary inputsperhaps he is using the word "nonstationary" in a sense different from the statistical meaning of the term. There are other things in the book that I would quibble with, but I did not find any of the flagrant errors that are common in other books on NN applications such as Swingler (1996). The one serious drawback of this book is that it is more than one page long and may therefore tax the attention span of the SBE. But any SBE who succeeds in reading the entire book should learn enough to be able to hire a good NN expert to do the real work. The best elementary textbooks  Fausett, L. (1994), Fundamentals of Neural Networks: Architectures, Algorithms, and Applications, Englewood Cliffs, NJ: Prentice Hall, ISBN 0133341860. Also published as a Prentice Hall International Edition, ISBN 0130422509. Sample software (source code listings in C and Fortran) is included in an Instructor's Manual. Book Webpage (Publisher): http://www.prenhall.com/books/esm_0133341860.html Additional Information: The mentioned programs / additional support is not available. Contents: Ch. 1 Introduction, 1.1 Why Neural Networks and Why Now?, 1.2 What Is a Neural Net?, 1.3 Where Are Neural Nets Being Used?, 1.4 How Are Neural Networks Used?, 1.5 Who Is Developing Neural Networks?, 1.6 When Neural Nets Began: the McCullochPitts Neuron; Ch. 2 Simple Neural Nets for Pattern Classification, 2.1 General Discussion, 2.2 Hebb Net, 2.3 Perceptron, 2.4 Adaline; Ch. 3 Pattern Association, 3.1 Training Algorithms for Pattern Association, 3.2 Heteroassociative Memory Neural Network, 3.3 Autoassociative Net, 3.4 Iterative Autoassociative Net, 3.5 Bidirectional Associative Memory (BAM); Ch. 4 Neural Networks Based on Competition, 4.1 FixedWeight Competitive Nets, 4.2 Kohonen SelfOrganizing Maps, 4.3 Learning Vector Quantization, 4.4 Counterpropagation; Ch. 5 Adaptive Resonance Theory, 5.1 Introduction, 5.2 Art1, 5.3 Art2; Ch. 6 Backpropagation Neural Net, 6.1 Standard Backpropagation, 6.2 Variations, 6.3 Theoretical Results; Ch. 7 A Sampler of Other Neural Nets, 7.1 Fixed Weight Nets for Constrained Optimization, 7.2 A Few More Nets that Learn, 7.3 Adaptive Architectures, 7.4 Neocognitron; Glossary. Review by Ian Cresswell: What a relief! As a broad introductory text this is without any doubt the best currently available in its area. It doesn't include source code of any kind (normally this is badly written and compiler specific). The algorithms for many different kinds of simple neural nets are presented in a clear step by step manner in plain English. Equally, the mathematics is introduced in a relatively gentle manner. There are no unnecessary complications or diversions from the main theme. The examples that are used to demonstrate the various algorithms are detailed but (perhaps necessarily) simple. There are bad things that can be said about most books. There are only a small number of minor criticisms that can be made about this one. More space should have been given to backprop and its variants because of the practical importance of such methods. And while the author discusses early stopping in one paragraph, the treatment of generalization is skimpy compared to the books by Weiss and Kulikowski or Smith listed above. If you're new to neural nets and you don't want to be swamped by bogus ideas, huge amounts of intimidating looking mathematics, a programming language that you don't know etc. etc. then this is the book for you. In summary, this is the best starting point for the outsider and/or beginner... a truly excellent text. Smith, M. (1996). Neural Networks for Statistical Modeling, NY: Van Nostrand Reinhold, ISBN 0442013108. Apparently there is a new edition I haven't seen yet: Smith, M. (1996). Neural Networks for Statistical Modeling, Boston: International Thomson Computer Press, ISBN 1850328420. Book Webpage (Publisher): http://www.thompson.com/ Publisher's address: 20 Park Plaza, Suite 1001, Boston, MA 02116, USA. Smith is not a statistician, but he has made an impressive effort to convey statistical fundamentals applied to neural networks. The book has entire brief chapters on overfitting and validation (early stopping and splitsample validation, which he incorrectly calls crossvalidation), putting it a rung above most other introductions to NNs. There are also brief chapters on data preparation and diagnostic plots, topics usually ignored in elementary NN books. Only feedforward nets are covered in any detail. Chapter headings: Mapping Functions; Basic Concepts; Error Derivatives; Learning Laws; Weight Initialization; The Course of Learning: An Example; Overfitting; Cross Validation; Preparing the Data; Representing Variables; Using the Model. Weiss, S.M. and Kulikowski, C.A. (1991), Computer Systems That Learn, Morgan Kaufmann. ISBN 1558600655. Author's Webpage: Kulikowski: http://ruccs.rutgers.edu/faculty/kulikowski.html Book Webpage (Publisher): http://www.mkp.com/books_catalog/1558600655.asp Additional Information: Information of Weiss, S.M. are not available. Briefly covers at a very elementary level feedforward nets, linear and nearestneighbor discriminant analysis, trees, and expert sytems, emphasizing practical applications. For a book at this level, it has an unusually good chapter on estimating generalization error, including bootstrapping. 1 Overview of Learning Systems 1.1 What is a Learning System? 1.2 Motivation for Building Learning Systems 1.3 Types of Practical Empirical Learning Systems 1.3.1 Common Theme: The Classification Model 1.3.2 Let the Data Speak 1.4 What's New in Learning Methods 1.4.1 The Impact of New Technology 1.5 Outline of the Book 1.6 Bibliographical and Historical Remarks 2 How to Estimate the True Performance of a Learning System 2.1 The Importance of Unbiased Error Rate Estimation 2.2. What is an Error? 2.2.1 Costs and Risks 2.3 Apparent Error Rate Estimates 2.4 Too Good to Be True: Overspecialization 2.5 True Error Rate Estimation 2.5.1 The Idealized Model for Unlimited Samples 2.5.2 Trainand Test Error Rate Estimation 2.5.3 Resampling Techniques 2.5.4 Finding the Right Complexity Fit 2.6 Getting the Most Out of the Data 2.7 Classifier Complexity and Feature Dimensionality 2.7.1 Expected Patterns of Classifier Behavior 2.8 What Can Go Wrong? 2.8.1 Poor Features, Data Errors, and Mislabeled Classes 2.8.2 Unrepresentative Samples 2.9 How Close to the Truth? 2.10 Common Mistakes in Performance Analysis 2.11 Bibliographical and Historical Remarks 3 Statistical Pattern Recognition 3.1 Introduction and Overview 3.2 A Few Sample Applications 3.3 Bayesian Classifiers 3.3.1 Direct Application of the Bayes Rule 3.4 Linear Discriminants 3.4.1 The Normality Assumption and Discriminant Functions 3.4.2 Logistic Regression 3.5 Nearest Neighbor Methods 3.6 Feature Selection 3.7 Error Rate Analysis 3.8 Bibliographical and Historical Remarks 4 Neural Nets 4.1 Introduction and Overview 4.2 Perceptrons 4.2.1 Least Mean Square Learning Systems 4.2.2 How Good Is a Linear Separation Network? 4.3 Multilayer Neural Networks 4.3.1 BackPropagation 4.3.2 The Practical Application of BackPropagation 4.4 Error Rate and Complexity Fit Estimation 4.5 Improving on Standard BackPropagation 4.6 Bibliographical and Historical Remarks 5 Machine Learning: Easily Understood Decision Rules 5.1 Introduction and Overview 5.2 Decision Trees 5.2.1 Finding the Perfect Tree 5.2.2 The Incredible Shrinking Tree 5.2.3 Limitations of Tree Induction Methods 5.3 Rule Induction 5.3.1 Predictive Value Maximization 5.4 Bibliographical and Historical Remarks 6 Which Technique is Best? 6.1 What's Important in Choosing a Classifier? 6.1.1 Prediction Accuracy 6.1.2 Speed of Learning and Classification 6.1.3 Explanation and Insight 6.2 So, How Do I Choose a Learning System? 6.3 Variations on the Standard Problem 6.3.1 Missing Data 6.3.2 Incremental Learning 6.4 Future Prospects for Improved Learning Methods 6.5 Bibliographical and Historical Remarks 7 Expert Systems 7.1 Introduction and Overview 7.1.1 Why Build Expert Systems? New vs. Old Knowledge 7.2 Estimating Error Rates for Expert Systems 7.3 Complexity of Knowledge Bases 7.3.1 How Many Rules Are Too Many? 7.4 Knowledge Base Example 7.5 Empirical Analysis of Knowledge Bases 7.6 Future: Combined Learning and Expert Systems 7.7 Bibliographical and Historical Remarks Reed, R.D., and Marks, R.J, II (1999), Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks, Cambridge, MA: The MIT Press, ISBN 0262181908. Author's Webpage: Marks: http://cialab.ee.washington.edu/Marks.html Book Webpage (Publisher): http://mitpress.mit.edu/bookhome.tcl?isbn=0262181908 After you have read Smith (1996) or Weiss and Kulikowski (1991), consult Reed and Marks for practical details on training MLPs (other types of neural nets such as RBF networks are barely even mentioned). They provide extensive coverage of backprop and its variants, and they also survey conventional optimization algorithms. Their coverage of initialization methods, constructive networks, pruning, and regularization methods is unusually thorough. Unlike the vast majority of books on neural nets, this one has lots of really informative graphs. The chapter on generalization assessment is slightly weak, which is why you should read Smith (1996) or Weiss and Kulikowski (1991) first. Also, there is little information on data preparation, for which Smith (1996) and Masters (1993; see below) should be consulted. There is some elementary calculus, but not enough that it should scare off anybody. Many secondrate books treat neural nets as mysterious black boxes, but Reed and Marks open up the box and provide genuine insight into the way neural nets work. One problem with the book is that the terms "validation set" and "test set" are used inconsistently. Chapter headings: Supervised Learning; SingleLayer Networks; MLP Representational Capabilities; BackPropagation; Learning Rate and Momentum; WeightInitialization Techniques; The Error Surface; Faster Variations of BackPropagation; Classical Optimization Techniques; Genetic Algorithms and Neural Networks; Constructive Methods; Pruning Algorithms; Factors Influencing Generalization; Generalization Prediction and Assessment; Heuristics for Improving Generalization; Effects of Training with Noisy Inputs; Linear Regression; Principal Components Analysis; Jitter Calculations; Sigmoidlike Nonlinear Functions The best books on using and programming NNs  Masters, T. (1993), Practical Neural Network Recipes in C++, Academic Press, ISBN 0124790402, US $45 incl. disks. Book Webpage (Publisher): http://www.apcatalog.com/cgibin/AP?ISBN=0124790402&LOCATION=US&FORM=FORM2 Masters has written three exceptionally good books on NNs (the two others are listed below). He combines generally sound practical advice with some basic statistical knowledge to produce a programming text that is far superior to the competition (see "The Worst" below). Not everyone likes his C++ code (the usual complaint is that the code is not sufficiently OO) but, unlike the code in some other books, Masters's code has been successfully compiled and run by some readers of comp.ai.neuralnets. Masters's books are well worth reading even for people who have no interest in programming. Chapter headings: Foundations; Classification; Autoassociation; TimeSeries Prediction; Function Approximation; Multilayer Feedforward Networks; Eluding Local Minima I: Simulated Annealing; Eluding Local Minima II: Genetic Optimization; Regression and Neural Networks; Designing Feedforward Network Architectures; Interpreting Weights: How Does This Thing Work; Probabilistic Neural Networks; Functional Link Networks; Hybrid Networks; Designing the Training Set; Preparing Input Data; Fuzzy Data and Processing; Unsupervised Training; Evaluating Performance of Neural Networks; Confidence Measures; Optimizing the Decision Threshold; Using the NEURAL Program. Masters, T. (1995) Advanced Algorithms for Neural Networks: A C++ Sourcebook, NY: John Wiley and Sons, ISBN 0471105880 Book Webpage (Publisher): http://www.wiley.com/ Additional Information: One has to search. Clear explanations of conjugate gradient and LevenbergMarquardt optimization algorithms, simulated annealing, kernel regression (GRNN) and discriminant analysis (PNN), GramCharlier networks, dimensionality reduction, crossvalidation, and bootstrapping. Masters, T. (1994), Signal and Image Processing with Neural Networks: A C++ Sourcebook, NY: Wiley, ISBN 0471049638. Book Webpage (Publisher): http://www.wiley.com/ Additional Information: One has to search. The best intermediate textbooks on NNs  Bishop, C.M. (1995). Neural Networks for Pattern Recognition, Oxford: Oxford University Press. ISBN 0198538499 (hardback) or 0198538642 (paperback), xvii+482 pages. Book Webpage (Author): http://research.microsoft.com/~cmbishop/nnpr.htm Book Webpage (Publisher): http://www.oup.co.uk/isbn/0198538642 This is definitely the best book on feedforward neural nets for readers comfortable with calculus. The book is exceptionally well organized, presenting topics in a logical progression ideal for conceptual understanding. Geoffrey Hinton writes in the foreword: "Bishop is a leading researcher who has a deep understanding of the material and has gone to great lengths to organize it in a sequence that makes sense. He has wisely avoided the temptation to try to cover everything and has therefore omitted interesting topics like reinforcement learning, Hopfield networks, and Boltzmann machines in order to focus on the types of neural networks that are most widely used in practical applications. He assumes that the reader has the basic mathematical literacy required for an undergraduate science degree, and using these tools he explains everything from scratch. Before introducing the multilayer perceptron, for example, he lays a solid foundation of basic statistical concepts. So the crucial concept of overfitting is introduced using easily visualized examples of onedimensional polynomials and only later applied to neural networks. An impressive aspect of this book is that it takes the reader all the way from the simplest linear models to the very latest Bayesian multilayer neural networks without ever requiring any great intellectual leaps." Chapter headings: Statistical Pattern Recognition; Probability Density Estimation; SingleLayer Networks; The Multilayer Perceptron; Radial Basis Functions; Error Functions; Parameter Optimization Algorithms; Preprocessing and Feature Extraction; Learning and Generalization; Bayesian Techniques; Symmetric Matrices; Gaussian Integrals; Lagrange Multipliers; Calculus of Variations; Principal Components. Hertz, J., Krogh, A., and Palmer, R. (1991). Introduction to the Theory of Neural Computation. Redwood City, CA: AddisonWesley, ISBN 0201503956 (hardbound) and 0201515601 (paperbound) Book Webpage (Publisher): http://www2.awl.com/gb/abp/sfi/computer.html This is an excellent classic work on neural nets from the perspective of physics covering a wide variety of networks. Comments from readers of comp.ai.neuralnets: "My first impression is that this one is by far the best book on the topic. And it's below $30 for the paperback."; "Well written, theoretical (but not overwhelming)"; It provides a good balance of model development, computational algorithms, and applications. The mathematical derivations are especially well done"; "Nice mathematical analysis on the mechanism of different learning algorithms"; "It is NOT for mathematical beginner. If you don't have a good grasp of higher level math, this book can be really tough to get through." The best advanced textbook covering NNs  Ripley, B.D. (1996) Pattern Recognition and Neural Networks, Cambridge: Cambridge University Press, ISBN 0521460867 (hardback), xii+403 pages. Author's Webpage: http://www.stats.ox.ac.uk/~ripley/ Book Webpage (Publisher): http://www.cup.cam.ac.uk/ Additional Information: The Webpage includes errata and additional information, which hasn't been available at publishing time, for this book. Brian Ripley's book is an excellent sequel to Bishop (1995). Ripley starts up where Bishop left off, with Bayesian inference and statistical decision theory, and then covers some of the same material on NNs as Bishop but at a higher mathematical level. Ripley also covers a variety of methods that are not discussed, or discussed only briefly, by Bishop, such as treebased methods and belief networks. While Ripley is best appreciated by people with a background in mathematical statistics, the numerous realistic examples in his book will be of interest even to beginners in neural nets. Chapter headings: Introduction and Examples; Statistical Decision Theory; Linear Discriminant Analysis; Flexible Discriminants; Feedforward Neural Networks; Nonparametric Methods; Treestructured Classifiers; Belief Networks; Unsupervised Methods; Finding Good Pattern Features; Statistical Sidelines. Devroye, L., Györfi, L., and Lugosi, G. (1996), A Probabilistic Theory of Pattern Recognition, NY: Springer, ISBN 0387946187, vii+636 pages. This book has relatively little material explicitly about neural nets, but what it has is very interesting and much of it is not found in other texts. The emphasis is on statistical proofs of universal consistency for a wide variety of methods, including histograms, (k) nearest neighbors, kernels (PNN), trees, generalized linear discriminants, MLPs, and RBF networks. There is also considerable material on validation and crossvalidation. The authors say, "We did not scar the pages with backbreaking simulations or quickanddirty engineering solutions" (p. 7). The formulatotext ratio is high, but the writing is quite clear, and anyone who has had a year or two of mathematical statistics should be able to follow the exposition. Chapter headings: The Bayes Error; Inequalities and Alternate Distance Measures; Linear Discrimination; Nearest Neighbor Rules; Consistency; Slow Rates of Convergence; Error Estimation; The Regular Histogram Rule; Kernel Rules; Consistency of the kNearest Neighbor Rule; VapnikChervonenkis Theory; Combinatorial Aspects of VapnikChervonenkis Theory; Lower Bounds for Empirical Classifier Selection; The Maximum Likelihood Principle; Parametric Classification; Generalized Linear Discrimination; Complexity Regularization; Condensed and Edited Nearest Neighbor Rules; Tree Classifiers; DataDependent Partitioning; Splitting the Data; The Resubstitution Estimate; Deleted Estimates of the Error Probability; Automatic Kernel Rules; Automatic Nearest Neighbor Rules; Hypercubes and Discrete Spaces; Epsilon Entropy and Totally Bounded Sets; Uniform Laws of Large Numbers; Neural Networks; Other Error Estimates; Feature Extraction. The best books on neurofuzzy systems  Brown, M., and Harris, C. (1994), Neurofuzzy Adaptive Modelling and Control, NY: Prentice Hall, ISBN 0131344536. Author's Webpage: http://www.isis.ecs.soton.ac.uk/people/m_brown.html and http://www.ecs.soton.ac.uk/~cjh/ Book Webpage (Publisher): http://www.prenhall.com/books/esm_0131344536.html Additional Information: Additional page at: http://www.isis.ecs.soton.ac.uk/publications/neural/mqbcjh94e.html and an abstract can be found at: http://www.isis.ecs.soton.ac.uk/publications/neural/mqb93.html Brown and Harris rely on the fundamental insight that that a fuzzy system is a nonlinear mapping from an input space to an output space that can be parameterized in various ways and therefore can be adapted to data using the usual neural training methods (see "What is backprop?") or conventional numerical optimization algorithms (see "What are conjugate gradients, LevenbergMarquardt, etc.?"). Their approach makes clear the intimate connections between fuzzy systems, neural networks, and statistical methods such as Bspline regression. The best comparison of NNs with other classification methods  Michie, D., Spiegelhalter, D.J. and Taylor, C.C. (1994), Machine Learning, Neural and Statistical Classification, Ellis Horwood. Author's Webpage: Donald Michie: http://www.aiai.ed.ac.uk/~dm/dm.html Additional Information: This book is out of print but available online at http://www.amsta.leeds.ac.uk/~charles/statlog/ Other notable books +++++++++++++++++++ Introductory  Anderson, J.A. (1995), An Introduction to Neural Networks, Cambridge,MA: The MIT Press, ISBN 0262011441. Author's Webpage: http://www.cog.brown.edu/~anderson Book Webpage (Publisher): http://mitpress.mit.edu/bookhome.tcl?isbn=0262510812 or http://mitpress.mit.edu/bookhome.tcl?isbn=0262011441 (hardback) Additional Information: Programs and additional information can be found at: ftp://mitpress.mit.edu/pub/IntrotoNeuralNets/ Anderson provides an accessible introduction to the AI and neurophysiological sides of NN research, although the book is weak regarding practical aspects of using NNs. Chapter headings: Properties of Single Neurons; Synaptic Integration and Neuron Models; Essential Vector Operations; Lateral Inhibition and Sensory Processing; Simple Matrix Operations; The Linear Associator: Background and Foundations; The Linear Associator: Simulations; Early Network Models: The Perceptron; Gradient Descent Algorithms; Representation of Information; Applications of Simple Associators: Concept Formation and Object Motion; Energy and Neural Networks: Hopfield Networks and Boltzmann Machines; Nearest Neighbor Models; Adaptive Maps; The BSB Model: A Simple Nonlinear Autoassociative Neural Network; Associative Computation; Teaching Arithmetic to a Neural Network. Hagan, M.T., Demuth, H.B., and Beale, M. (1996), Neural Network Design, Boston: PWS, ISBN 0534943322. It doesn't really say much about design, but this book provides formulas and examples in excruciating detail for a wide variety of networks. It also includes some mathematical background material. Chapter headings: Neuron Model and Network Architectures; An Illustrative Example; Perceptron Learning Rule; Signal and Weight Vector Spaces; Linear Transformations for Neural; Networks; Supervised Hebbian Learning; Performance Surfaces and Optimum Points; Performance Optimization; WidrowHoff Learning; Backpropagation; Variations on Backpropagation; Associative Learning; Competitive Networks; Grossberg Network; Adaptive Resonance Theory; Stability; Hopfield Network. Abdi, H., Valentin, D., and Edelman, B. (1999), Neural Networks, Sage University Papers Series on Quantitative Applications in the Social Sciences, 07124, Thousand Oaks, CA: Sage, ISBN 0761914404. Inexpensive, brief (89 pages) but very detailed explanations of linear networks and the basics of backpropagation. Chapter headings: 1. Introduction 2. The Perceptron 3. Linear Autoassociative Memories 4. Linear Heteroassociative Memories 5. Error Backpropagation 6. Useful References. Bayesian learning  Neal, R. M. (1996) Bayesian Learning for Neural Networks, New York: SpringerVerlag, ISBN 0387947248. Biological learning and neurophysiology  Koch, C., and Segev, I., eds. (1998) Methods in Neuronal Modeling: From Ions to Networks, 2nd ed., Cambridge, MA: The MIT Press, ISBN 0262112310. Book Webpage: http://goethe.klab.caltech.edu/MNM/ Rolls, E.T., and Treves, A. (1997), Neural Networks and Brain Function, Oxford: Oxford University Press, ISBN: 0198524323. Chapter headings: Introduction; Pattern association memory; Autoassociation memory; Competitive networks, including selforganizing maps; Errorcorrecting networks: perceptrons, the delta rule, backpropagation of error in multilayer networks, and reinforcement learning algorithms; The hippocampus and memory; Pattern association in the brain: amygdala and orbitofrontal cortex; Cortical networks for invariant pattern recognition; Motor systems: cerebellum and basal ganglia; Cerebral neocortex. Schmajuk, N.A. (1996) Animal Learning and Cognition: A Neural Network Approach, Cambridge: Cambridge University Press, ISBN 0521456967. Chapter headings: Neural networks and associative learning Classical conditioning: data and theories; Cognitive mapping; Attentional processes; Storage and retrieval processes; Configural processes; Timing; Operant conditioning and animal communication: data, theories, and networks; Animal cognition: data and theories; Place learning and spatial navigation; Maze learning and cognitive mapping; Learning, cognition, and the hippocampus: data and theories; Hippocampal modulation of learning and cognition; The character of the psychological law. Collections  Orr, G.B., and Mueller, K.R., eds. (1998), Neural Networks: Tricks of the Trade, Berlin: Springer, ISBN 3540653112. Articles: Efficient BackProp; Early Stopping  But When? A Simple Trick for Estimating the Weight Decay Parameter; Controling the Hyperparameter Search in MacKay's Bayesian Neural Network Framework; Adaptive Regularization in Neural Network Modeling; Large Ensemble Averaging; Square Unit Augmented, Radially Extended, Multilayer Perceptrons; A Dozen Tricks with Multitask Learning; Solving the IllConditioning in Neural Network Learning; Centering Neural Network Gradient Factors; Avoiding Roundoff Error in Backpropagating Derivatives; Transformation Invariance in Pattern Recognition  Tangent Distance and Tangent Propagation; Combining Neural Networks and ContextDriven Search for OnLine, Printed Handwriting Recognition in the Newton; Neural Network Classification and Prior Class Probabilities; Applying Divide and Conquer to Large Scale Pattern Recognition Tasks; Forecasting the Economy with Neural Nets: A Survey of Challenges and Solutions; How to Train Neural Networks. Arbib, M.A., ed. (1995), The Handbook of Brain Theory and Neural Networks, Cambridge, MA: The MIT Press, ISBN 0262511029. From The Publisher: The heart of the book, part III, comprises of 267 original articles by leaders in the various fields, arranged alphabetically by title. Parts I and II, written by the editor, are designed to help readers orient themselves to this vast range of material. Part I, Background, introduces several basic neural models, explains how the present study of brain theory and neural networks integrates brain theory, artificial intelligence, and cognitive psychology, and provides a tutorial on the concepts essential for understanding neural networks as dynamic, adaptive systems. Part II, Road Maps, provides entry into the many articles of part III through an introductory "MetaMap" and twentythree road maps, each of which tours all the Part III articles on the chosen theme. Touretzky, D., Hinton, G, and Sejnowski, T., eds., (1989) Proceedings of the 1988 Connectionist Models Summer School, San Mateo, CA: Morgan Kaufmann, ISBN: 1558600337 NIPS: 1. Touretzky, D.S., ed. (1989), Advances in Neural Information Processing Systems 1, San Mateo, CA: Morgan Kaufmann, ISBN: 1558600159 2. Touretzky, D. S., ed. (1990), Advances in Neural Information Processing Systems 2, San Mateo, CA: Morgan Kaufmann, ISBN: 1558601007 3. Lippmann, R.P., Moody, J.E., and Touretzky, D. S., eds. (1991) Advances in Neural Information Processing Systems 3, San Mateo, CA: Morgan Kaufmann, ISBN: 1558601848 4. Moody, J.E., Hanson, S.J., and Lippmann, R.P., eds. (1992) Advances in Neural Information Processing Systems 4, San Mateo, CA: Morgan Kaufmann, ISBN: 1558602224 5. Hanson, S.J., Cowan, J.D., and Giles, C.L. eds. (1993) Advances in Neural Information Processing Systems 5, San Mateo, CA: Morgan Kaufmann, ISBN: 1558602747 6. Cowan, J.D., Tesauro, G., and Alspector, J., eds. (1994) Advances in Neural Information Processing Systems 6, San Mateo, CA: Morgan Kaufman, ISBN: 1558603220 7. Tesauro, G., Touretzky, D., and Leen, T., eds. (1995) Advances in Neural Information Processing Systems 7, Cambridge, MA: The MIT Press, ISBN: 0262201046 8. Touretzky, D. S., Mozer, M.C., and Hasselmo, M.E., eds. (1996) Advances in Neural Information Processing Systems 8, Cambridge, MA: The MIT Press, ISBN: 0262201070 9. Mozer, M.C., Jordan, M.I., and Petsche, T., eds. (1997) Advances in Neural Information Processing Systems 9, Cambridge, MA: The MIT Press, ISBN: 0262100657 10. Jordan, M.I., Kearns, M.S., and Solla, S.A., eds. (1998) Advances in Neural Information Processing Systems 10, Cambridge, MA: The MIT Press, ISBN: 0262100762 11. Kearns, M.S., Solla, S.A., amd Cohn, D.A., eds. (1999) Advances in Neural Information Processing Systems 11, Cambridge,MA: The MIT Press, ISBN: 0262112450 12. Solla, S.A., Leen, T., and Müller, K.R., eds. (2000) Advances in Neural Information Processing Systems 12, Cambridge, MA: The MIT Press, ISBN: 0262194503 Combining networks  Sharkey, A.J.C. (1999), Combining Artificial Neural Nets: Ensemble and Modular MultiNet Systems, London: Springer, ISBN: 185233004X Connectionism  Elman, J.L., Bates, E.A., Johnson, M.H., KarmiloffSmith, A., and Parisi, D. (1996) Rethinking Innateness: A Connectionist Perspective on Development, Cambridge, MA: The MIT Press, ISBN: 026255030X. Chapter headings: New perspectives on development; Why connectionism? Ontogenetic development: A connectionist synthesis; The shape of change; Brain development; Interactions, all the way down; Rethinking innateness. Plunkett, K., and Elman, J.L. (1997), Exercises in Rethinking Innateness: A Handbook for Connectionist Simulations, Cambridge, MA: The MIT Press, ISBN: 0262661055. Chapter headings: Introduction and overview; The methodology of simulations; Learning to use the simulator; Learning internal representations; Autoassociation; Generalization; Translation invariance; Simple recurrent networks; Critical points in learning; Modeling stages in cognitive development; Learning the English past tense; The importance of starting small. Feedforward networks  Fine, T.L. (1999) Feedforward Neural Network Methodology, NY: Springer, ISBN 0387987452. Husmeier, D. (1999), Neural Networks for Conditional Probability Estimation: Forecasting Beyond Point Predictions, Berlin: Springer Verlag, ISBN 185233095. Fuzzy logic and neurofuzzy systems  See also "General (including SVMs and Fuzzy Logic)". Kosko, B. (1997), Fuzzy Engineering, Upper Saddle River, NJ: Prentice Hall, ISBN 0131249916. Kosko's new book is a big improvement over his older neurofuzzy book and makes an excellent sequel to Brown and Harris (1994). Nauck, D., Klawonn, F., and Kruse, R. (1997), Foundations of NeuroFuzzy Systems, Chichester: Wiley, ISBN 0471971510. Chapter headings: Historical and Biological Aspects; Neural Networks; Fuzzy Systems; Modelling NeuroFuzzy Systems; Cooperative NeuroFuzzy Systems; Hybrid NeuroFuzzy Systems; The Generic Fuzzy Perceptron; NEFCON  NeuroFuzzy Control; NEFCLASS  NeuroFuzzy Classification; NEFPROX  NeuroFuzzy Function Approximation; Neural Networks and Fuzzy Prolog; Using NeuroFuzzy Systems. General (including SVMs and Fuzzy Logic)  Many books on neural networks, machine learning, etc., present various methods as miscellaneous tools without any conceptual framework relating different methods. The best of such neural net "cookbooks" is probably Haykin's (1999) second edition. Among conceptuallyintegrated books, there are two excellent books that use the VapnilChervonenkis theory as a unifying theme, and provide strong coverage of support vector machines and fuzzy logic, as well as neural nets. Of these two, Kecman (2001) provides clearer explanations and better diagrams, but Cherkassky and Mulier (1998) are better organized have an excellent section on unsupervised learning, especially selforganizing maps. I have been tempted to add both of these books to the "best" list, but I have not done so because I think VC theory is of doubtful practical utility for neural nets. However, if you are especially interested in VC theory and support vector machines, then both of these books can be highly recommended. To help you choose between them, a detailed table of contents is provided below for each book. Haykin, S. (1999), Neural Networks: A Comprehensive Foundation, 2nd ed., Upper Saddle River, NJ: Prentice Hall, ISBN 0132733501. The second edition is much better than the first, which has been described as a coredump of Haykin's brain. The second edition covers more topics, is easier to understand, and has better examples. Chapter headings: Introduction; Learning Processes; Single Layer Perceptrons; Multilayer Perceptrons; RadialBasis Function Networks; Support Vector Machines; Committee Machines; Principal Components Analysis; SelfOrganizing Maps; InformationTheoretic Models; Stochastic Machines And Their Approximates Rooted in Statistical Mechanics; Neurodynamic Programming; Temporal Processing Using Feedforward Networks; Neurodynamics; Dynamically Driven Recurrent Networks. Kecman, V. (2001), Learning and Soft Computing: Support Vector Machines, Neural Networks, and Fuzzy Logic Models, Cambridge, MA: The MIT Press; ISBN: 0262112558. URL: http://www.supportvector.ws/ Detailed Table of Contents: 1. Learning and Soft Computing: Rationale, Motivations, Needs, Basics 1.1 Examples of Applications in Diverse Fields 1.2 Basic Tools of Soft Computing: Neural Networks, Fuzzy Logic Systems, and Support Vector Machines 1.2.1 Basics of Neural Networks 1.2.2 Basics of Fuzzy Logic Modeling 1.3 Basic Mathematics of Soft Computing 1.3.1 Approximation of Multivariate Functions 1.3.2 Nonlinear Error Surface and Optimization 1.4 Learning and Statistical Approaches to Regression and Classification 1.4.1 Regression 1.4.2 Classification Problems Simulation Experiments 2. Support Vector Machines 2.1 Risk Minimization Principles and the Concept of Uniform Convergence 2.2 The VC Dimension 2.3 Structural Risk Minimization 2.4 Support Vector Machine Algorithms 2.4.1 Linear Maximal Margin Classifier for Linearly Separable Data 2.4.2 Linear Soft Margin Classifier for Overlapping Classes 2.4.3 The Nonlinear Classifier 2.4.4 Regression by Support Vector Machines Problems Simulation Experiments 3. SingleLayer Networks 3.1 The Perceptron 3.1.1 The Geometry of Perceptron Mapping 3.1.2 Convergence Theorem and Perceptron Learning Rule 3.2 The Adaptive Linear Neuron (Adaline) and the Least Mean Square Algorithm 3.2.1 Representational Capabilities of the Adaline 3.2.2 Weights Learning for a Linear Processing Unit Problems Simulation Experiments 4. Multilayer Perceptrons 4.1 The Error Backpropagation Algorithm 4.2 The Generalized Delta Rule 4.3 Heuristics or Practical Aspects of the Error Backpropagation Algorithm 4.3.1 One, Two, or More Hidden Layers? 4.3.2 Number of Neurons in a Hidden Layer, or the BiasVariance Dilemma 4.3.3 Type of Activation Functions in a Hidden Layer and the Geometry of Approximation 4.3.4 Weights Initialization 4.3.5 Error Function for Stopping Criterion at Learning 4.3.6 Learning Rate and the Momentum Term Problems Simulation Experiments 5. Radial Basis Function Networks 5.1 IllPosed Problems and the Regularization Technique 5.2 Stabilizers and Basis Functions 5.3 Generalized Radial Basis Function Networks 5.3.1 Moving Centers Learning 5.3.2 Regularization with Nonradial Basis Functions 5.3.3 Orthogonal Least Squares 5.3.4 Optimal Subset Selection by Linear Programming Problems Simulation Experiments 6. Fuzzy Logic Systems 6.1 Basics of Fuzzy Logic Theory 6.1.1 Crisp (or Classic) and Fuzzy Sets 6.1.2 Basic Set Operations 6.1.3 Fuzzy Relations 6.1.4 Composition of Fuzzy Relations 6.1.5 Fuzzy Inference 6.1.6 Zadeh's Compositional Rule of Inference 6.1.7 Defuzzification 6.2 Mathematical Similarities between Neural Networks and Fuzzy Logic Models 6.3 Fuzzy Additive Models Problems Simulation Experiments 7. Case Studies 7.1 Neural NetworksBased Adaptive Control 7.1.1 General Learning Architecture, or Direct Inverse Modeling 7.1.2 Indirect Learning Architecture 7.1.3 Specialized Learning Architecture 7.1.4 Adaptive Backthrough Control 7.2 Financial Time Series Analysis 7.3 Computer Graphics 7.3.1 OneDimensional Morphing 7.3.2 Multidimensional Morphing 7.3.3 Radial Basis Function Networks for Human Animation 7.3.4 Radial Basis Function Networks for Engineering Drawings 8. Basic Nonlinear Optimization Methods 8.1 Classical Methods 8.1.1 NewtonRaphson Method 8.1.2 Variable Metric or QuasiNewton Methods 8.1.3 DavidonFletcherPowel Method 8.1.4 BroydenFletcherGo1dfarbShano Method 8.1.5 Conjugate Gradient Methods 8.1.6 FletcherReeves Method 8.1.7 PolakRibiere Method 8.1.8 Two Specialized Algorithms for a SumofErrorSquares Error Function GaussNewton Method LevenbergMarquardt Method 8.2 Genetic Algorithms and Evolutionary Computing 8.2.1 Basic Structure of Genetic Algorithms 8.2.2 Mechanism of Genetic Algorithms 9. Mathematical Tools of Soft Computing 9.1 Systems of Linear Equations 9.2 Vectors and Matrices 9.3 Linear Algebra and Analytic Geometry 9.4 Basics of Multivariable Analysis 9.5 Basics from Probability Theory Cherkassky, V.S., and Mulier, F.M. (1998), Learning from Data : Concepts, Theory, and Methods, NY: John Wiley & Sons; ISBN: 0471154938. Detailed Table of Contents: 1 Introduction 1.1 Learning and Statistical Estimation 1.2 Statistical Dependency and Causality 1.3 Characterization of Variables 1.4 Characterization of Uncertainty References 2 Problem Statement, Classical Approaches, and Adaptive Learning 2.1 Formulation of the Learning Problem 2.1.1 Role of the Learning Machine 2.1.2 Common Learning Tasks 2.1.3 Scope of the Learning Problem Formulation 2.2 Classical Approaches 2.2.1 Density Estimation 2.2.2 Classification (Discriminant Analysis) 2.2.3 Regression 2.2.4 Stochastic Approximation 2.2.5 Solving Problems with Finite Data 2.2.6 Nonparametric Methods 2.3 Adaptive Learning: Concepts and Inductive Principles 2.3.1 Philosophy, Major Concepts, and Issues 2.3.2 A priori Knowledge and Model Complexity 2.3.3 Inductive Principles 2.4 Summary References 3 Regularization Framework 3.1 Curse and Complexity of Dimensionality 3.2 Function Approx. and Characterization of Complexity 3.3 Penalization 3.3.1 Parametric Penalties 3.3.2 Nonparametric Penalties 3.4 Model Selection (Complexity Control) 3.4.1 Analytical Model Selection Criteria 3.4.2 Model Selection via Resampling 3.4.3 Biasvariance Tradeoff 3.4.4 Example of Model Selection 3.5 Summary References 4 Statistical Learning Theory 4.1 Conditions for Consistency and Convergence of ERM 4.2 Growth Function and VCDimension 4.2.1 VCDimension of the Set of RealValued Functions 4.2.2 VCDim. for Classification and Regression Problems 4.2.3 Examples of Calculating VCDimension 4.3 Bounds on the Generalization 4.3.1 Classification 4.3.2 Regression 4.3.3 Generalization Bounds and Sampling Theorem 4.4 Structural Risk Minimization 4.5 Case Study: Comparison of Methods for Model Selection 4.6 Summary References 5 Nonlinear Optimization Strategies 5.1 Stochastic Approximation Methods 5.1.1 Linear Parameter Estimation 5.1.2 Backpropagation Training of MLP Networks 5.2 Iterative Methods 5.2.1 ExpectationMaximization Methods for Density Est. 5.2.2 Generalized Inverse Training of MLP Networks 5.3 Greedy Optimization 5.3.1 Neural Network Construction Algorithms 5.3.2 Classification and Regression Trees (CART) 5.4 Feature Selection, Optimization, and Stat. Learning Th. 5.5 Summary References 6 Methods for Data Reduction and Dim. Reduction 6.1 Vector Quantization 6.1.1 Optimal Source Coding in Vector Quantization 6.1.2 Generalized Lloyd Algorithm 6.1.3 Clustering and Vector Quantization 6.1.4 EM Algorithm for VQ and Clustering 6.2 Dimensionality Reduction: Statistical Methods 6.2.1 Linear Principal Components 6.2.2 Principal Curves and Surfaces 6.3 Dimensionality Reduction: Neural Network Methods 6.3.1 Discrete Principal Curves and Selforg. Map Alg. 6.3.2 Statistical Interpretation of the SOM Method 6.3.3 Flowthrough Version of the SOM and Learning Rate Schedules 6.3.4 SOM Applications and Modifications 6.3.5 Selfsupervised MLP 6.4 Summary References 7 Methods for Regression 7.1 Taxonomy: Dictionary versus Kernel Representation 7.2 Linear Estimators 7.2.1 Estimation of Linear Models and Equivalence of Representations 7.2.2 Analytic Form of Crossvalidation 7.2.3 Estimating Complexity of Penalized Linear Models 7.3 Nonadaptive Methods 7.3.1 Local Polynomial Estimators and Splines 7.3.2 Radial Basis Function Networks 7.3.3 Orthogonal Basis Functions and Wavelets 7.4 Adaptive Dictionary Methods 7.4.1 Additive Methods and Projection Pursuit Regression 7.4.2 Multilayer Perceptrons and Backpropagation 7.4.3 Multivariate Adaptive Regression Splines 7.5 Adaptive Kernel Methods and Local Risk Minimization 7.5.1 Generalized MemoryBased Learning 7.5.2 Constrained Topological Mapping 7.6 Empirical Comparisons 7.6.1 Experimental Setup 7.6.2 Summary of Experimental Results 7.7 Combining Predictive Models 7.8 Summary References 8 Classification 8.1 Statistical Learning Theory formulation 8.2 Classical Formulation 8.3 Methods for Classification 8.3.1 RegressionBased Methods 8.3.2 TreeBased Methods 8.3.3 Nearest Neighbor and Prototype Methods 8.3.4 Empirical Comparisons 8.4 Summary References 9 Support Vector Machines 9.1 Optimal Separating Hyperplanes 9.2 High Dimensional Mapping and Inner Product Kernels 9.3 Support Vector Machine for Classification 9.4 Support Vector Machine for Regression 9.5 Summary References 10 Fuzzy Systems 10.1 Terminology, Fuzzy Sets, and Operations 10.2 Fuzzy Inference Systems and Neurofuzzy Systems 10.2.1 Fuzzy Inference Systems 10.2.2 Equivalent Basis Function Representation 10.2.3 Learning Fuzzy Rules from Data 10.3 Applications in Pattern Recognition 10.3.1 Fuzzy Input Encoding and Fuzzy Postprocessing 10.3.2 Fuzzy Clustering 10.4 Summary References Appendix A: Review of Nonlinear Optimization Appendix B: Eigenvalues and Singular Value Decomposition History  Hebb, D.O. (1949), The Organization of Behavior, NY: Wiley. Out of print. Rosenblatt, F. (1962), Principles of Neurodynamics, NY: Spartan Books. Out of print. Anderson, J.A., and Rosenfeld, E., eds. (1988), Neurocomputing: Foundatons of Research, Cambridge, MA: The MIT Press, ISBN 0262010976. Author's Webpage: http://www.cog.brown.edu/~anderson Book Webpage (Publisher): http://mitpress.mit.edu/bookhome.tcl?isbn=0262510480 43 articles of historical importance, ranging from William James to Rumelhart, Hinton, and Williams. Anderson, J. A., Pellionisz, A. and Rosenfeld, E. (Eds). (1990). Neurocomputing 2: Directions for Research. The MIT Press: Cambridge, MA. Author's Webpage: http://www.cog.brown.edu/~anderson Book Webpage (Publisher): http://mitpress.mit.edu/bookhome.tcl?isbn=0262510758 Carpenter, G.A., and Grossberg, S., eds. (1991), Pattern Recognition by SelfOrganizing Neural Networks, Cambridge, MA: The MIT Press, ISBN 0262031760 Articles on ART, BAM, SOMs, counterpropagation, etc. Nilsson, N.J. (1965/1990), Learning Machines, San Mateo, CA: Morgan Kaufmann, ISBN 1558601236. Minsky, M.L., and Papert, S.A. (1969/1988) Perceptrons, Cambridge, MA: The MIT Press, 1st ed. 1969, expanded edition 1988 ISBN 0262631113. Werbos, P.J. (1994), The Roots of Backpropagation, NY: John Wiley & Sons, ISBN: 0471598976. Includes Werbos's 1974 Harvard Ph.D. thesis, Beyond Regression. Kohonen, T. (1984/1989), Selforganization and Associative Memory, 1st ed. 1988, 3rd ed. 1989, NY: Springer. Author's Webpage: http://www.cis.hut.fi/nnrc/teuvo.html Book Webpage (Publisher): http://www.springer.de/ Additional Information: Book is out of print. Rumelhart, D. E. and McClelland, J. L. (1986), Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volumes 1 & 2, Cambridge, MA: The MIT Press ISBN 0262631121. Author's Webpage: http://wwwmed.stanford.edu/school/Neurosciences/faculty/rumelhart.html Book Webpage (Publisher): http://mitpress.mit.edu/bookhome.tcl?isbn=0262631121 HechtNielsen, R. (1990), Neurocomputing, Reading, MA: AddisonWesley, ISBN 0201093553. Book Webpage (Publisher): http://www.awl.com/ Anderson, J.A., and Rosenfeld, E., eds. (1998), Talking Nets: An Oral History of Neural Networks, Cambridge, MA: The MIT Press, ISBN 0262511118. Knowledge, rules, and expert systems  Gallant, S.I. (1995), Neural Network Learning and Expert Systems, Cambridge, MA: The MIT Press, ISBN 0262071452. Chapter headings:; Introduction and Important Definitions; Representation Issues; Perceptron Learning and the Pocket Algorithm; WinnerTakeAll Groups or Linear Machines; Autoassociators and OneShot Learning; Mean Squared Error (MSE) Algorithms; Unsupervised Learning; The Distributed Method and Radial Basis Functions; Computational Learning Theory and the BRD Algorithm; Constructive Algorithms; Backpropagation; Backpropagation: Variations and Applications; Simulated Annealing and Boltzmann Machines; Expert Systems and Neural Networks; Details of the MACIE System; Noise, Redundancy, Fault Detection, and Bayesian Decision Theory; Extracting Rules from Networks; Appendix: Representation Comparisons. Cloete, I., and Zurada, J.M. (2000), KnowledgeBased Neurocomputing, Cambridge, MA: The MIT Press, ISBN 0262032740. Articles: KnowledgeBased Neurocomputing: Past, Present, and Future; Architectures and Techniques for KnowledgeBased Neurocomputing; Symbolic Knowledge Representation in Recurrent Neural Networks: Insights from Theoretical Models of Computation; A Tutorial on Neurocomputing of Structures; Structural Learning and Rule Discovery; VL[subscript 1]ANN: Transformation of Rules to Artificial Neural Networks; Integrations of Heterogeneous Sources of Partial Domain Knowledge; Approximation of Differential Equations Using Neural Networks; Fynesse: A Hybrid Architecture for SelfLearning Control; Data Mining Techniques for Designing Neural Network Time Series Predictors; Extraction of Decision Trees from Artificial Neural Networks 369; Extraction of Linguistic Rules from Data via Neural Networks and Fuzzy Approximation; Neural Knowledge Processing in Expert Systems. Learning theory  Wolpert, D.H., ed. (1995) The Mathematics of Generalization: The Proceedings of the SFI/CNLS Workshop on Formal Approaches to Supervised Learning, Santa Fe Institute Studies in the Sciences of Complexity, Volume XX, Reading, MA: AddisonWesley, ISBN: 0201409836. Articles: The Status of Supervised Learning Science circa 1994  The Search for a Consensus; Reflections After Refereeing Papers for NIPS; The Probably Approximately Correct (PAC) and Other Learning Models; Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications; The Relationship Between PAC, the Statistical Physics Framework, the Bayesian Framework, and the VC Framework; Statistical Physics Models of Supervised Learning; On Exhaustive Learning; A Study of MaximalCoverage Learning Algorithms; On Bayesian Model Selection; Soft Classification, a.k.a. Risk Estimation, via Penalized Log Likelihood and Smoothing Spline Analysis of Variance; Current Research; Preface to Simplifying Neural Networks by Soft Weight Sharing; Simplifying Neural Networks by Soft Weight Sharing; ErrorCorrecting Output Codes: A General Method for Improving Multiclass Inductive Learning Programs; Image Segmentation and Recognition. Anthony, M., and Bartlett, P.L. (1999), Neural Network Learning: Theoretical Foundations, Cambridge: Cambridge University Press, ISBN 052157353X. Vapnik, V.N. (1998) Statistical Learning Theory, NY: Wiley, ISBN: 0471030031 This book is much better than Vapnik's The Nature of Statistical Learning Theory. Chapter headings: 0. Introduction: The Problem of Induction and Statistical Inference; 1. Two Approaches to the Learning Problem; Appendix: Methods for Solving IllPosed Problems; 2. Estimation of the Probability Measure and Problem of Learning; 3. Conditions for Consistency of Empirical Risk Minimization Principle; 4. Bounds on the Risk for Indicator Loss Functions; Appendix: Lower Bounds on the Risk of the ERM Principle; 5. Bounds on the Risk for RealValued Loss Functions; 6. The Structural Risk Minimization Principle; Appendix: Estimating Functions on the Basis of Indirect Measurements; 7. Stochastic IllPosed Problems; 8. Estimating the Values of Functions at Given Points; 9. Perceptrons and Their Generalizations; 10. The Support Vector Method for Estimating Indicator Functions; 11. The Support Vector Method for Estimating RealValued Functions; 12. SV Machines for Pattern Recognition; (includes examples of digit recognition) 13. SV Machines for Function Approximations, Regression Estimation, and Signal Processing; (includes an example of positron emission tomography) 14. Necessary and Sufficient Conditions for Uniform Convergence of Frequencies to Their Probabilities; 15. Necessary and Sufficient Conditions for Uniform Convergence of Means to Their Expectations; 16. Necessary and Sufficient Conditions for Uniform OneSided Convergence of Means to Their Expectations; Comments and Bibliographical Remarks. Object oriented programming  The FAQ maintainer is an oldfashioned C programmer and has no expertise in object oriented programming, so he must rely on the readers of comp.ai.neuralnets regarding the merits of books on OOP for NNs. There are many excellent books about NNs by Timothy Masters (listed elsewhere in the FAQ) that provide C++ code for NNs. If you simply want code that works, these books should satisfy your needs. If you want code that exemplifies the highest standards of object oriented design, you will be disappointed by Masters. The one book on OOP for NNs that seems to be consistently praised is: Rogers, Joey (1996), ObjectOriented Neural Networks in C++, Academic Press, ISBN 0125931158. Contents: 1. Introduction 2. ObjectOriented Programming Review 3. NeuralNetwork Base Classes 4. ADALINE Network 5. Backpropagation Neural Network 6. SelfOrganizing Neural Network 7. Bidirectional Associative Memory Appendix A Support Classes Appendix B Listings References and Suggested Reading However, you will learn very little about NNs other than elementary programming techniques from Rogers. To quote a customer review at the Barnes & Noble web site (http://www.bn.com): A reviewer, a scientific programmer, July 19, 2000, **** Long explaination of neural net code  not of neural nets Good OO code for simple 'off the shelf' implementation, very open & fairly extensible for further cusomization. A complete & lucid explanation of the code but pretty weak on the principles, theory, and application of neural networks. Great as a code source, disappointing as a neural network tutorial. Online and incremental learning  Saad, D., ed. (1998), OnLine Learning in Neural Networks, Cambridge: Cambridge University Press, ISBN 0521652634. Articles: Introduction; Online Learning and Stochastic Approximations; Exact and Perturbation Solutions for the Ensemble Dynamics; A Statistical Study of Online Learning; Online Learning in Switching and Drifting Environments with Application to Blind Source Separation; Parameter Adaptation in Stochastic Optimization; Optimal Online Learning in Multilayer Neural Networks; Universal Asymptotics in Committee Machines with Tree Architecture; Incorporating Curvature Information into Online Learning; Annealed Online Learning in Multilayer Neural Networks; Online Learning of Prototypes and Principal Components; Online Learning with TimeCorrelated Examples; Online Learning from Finite Training Sets; Dynamics of Supervised Learning with Restricted Training Sets; Online Learning of a Decision Boundary with and without Queries; A Bayesian Approach to Online Learning; Optimal Perceptron Learning: an Online Bayesian Approach. Optimization  Cichocki, A. and Unbehauen, R. (1993). Neural Networks for Optimization and Signal Processing. NY: John Wiley & Sons, ISBN 0471930105 (hardbound), 526 pages, $57.95. Book Webpage (Publisher): http://www.wiley.com/ Additional Information: One has to search. Chapter headings: Mathematical Preliminaries of Neurocomputing; Architectures and Electronic Implementation of Neural Network Models; Unconstrained Optimization and Learning Algorithms; Neural Networks for Linear, Quadratic Programming and Linear Complementarity Problems; A Neural Network Approach to the OnLine Solution of a System of Linear Algebraic; Equations and Related Problems; Neural Networks for Matrix Algebra Problems; Neural Networks for Continuous, Nonlinear, Constrained Optimization Problems; Neural Networks for Estimation, Identification and Prediction; Neural Networks for Discrete and Combinatorial Optimization Problems. Pulsed/Spiking networks  Maass, W., and Bishop, C.M., eds. (1999) Pulsed Neural Networks, Cambridge, MA: The MIT Press, ISBN: 0262133504. Articles: Spiking Neurons; Computing with Spiking Neurons; PulseBased Computation in VLSI Neural Networks; Encoding Information in Neuronal Activity; Building Silicon Nervous Systems with Dendritic Tree Neuromorphs; A PulseCoded Communications Infrastructure; Analog VLSI Pulsed Networks for Perceptive Processing; Preprocessing for Pulsed Neural VLSI Systems; Digital Simulation of Spiking Neural Networks; Populations of Spiking Neurons; Collective Excitation Phenomena and Their Applications; Computing and Learning with Dynamic Synapses; Stochastic BitStream Neural Networks; Hebbian Learning of Pulse Timing in the Barn Owl Auditory System. Recurrent  Medsker, L.R., and Jain, L.C., eds. (2000), Recurrent Neural Networks: Design and Applications, Boca Raton, FL: CRC Press, ISBN 0849371813 Articles: Introduction; Recurrent Neural Networks for Optimization: The State of the Art; Efficient SecondOrder Learning Algorithms for DiscreteTime Recurrent Neural Networks; Designing High Order Recurrent Networks for Bayesian Belief Revision; Equivalence in Knowledge Representation: Automata, Recurrent Neural Networks, and Dynamical Fuzzy Systems; Learning LongTerm Dependencies in NARX Recurrent Neural Networks; Oscillation Responses in a Chaotic Recurrent Network; Lessons from Language Learning; Recurrent Autoassociative Networks: Developing Distributed Representations of Hierarchically Structured Sequences by Autoassociation; Comparison of Recurrent Neural Networks for Trajectory Generation; Training Algorithms for Recurrent Neural Nets that Eliminate the Need for Computation of Error Gradients with Application to Trajectory Production Problem; Training Recurrent Neural Networks for Filtering and Control; Remembering How to Behave: Recurrent Neural Networks for Adaptive Robot Behavior Reinforcement learning  Sutton, R.S., and Barto, A.G. (1998), Reinforcement Learning: An Introduction, The MIT Press, ISBN: 0262193981. Author's Webpage: http://envy.cs.umass.edu/~rich/sutton.html and http://wwwanw.cs.umass.edu/People/barto/barto.html Book Webpage (Publisher): http://mitpress.mit.edu/bookhome.tcl?isbn=0262193981 Additional Information: http://wwwanw.cs.umass.edu/~rich/book/thebook.html Chapter headings: The Problem; Introduction; Evaluative Feedback; The Reinforcement Learning Problem; Elementary Solution Methods; Dynamic Programming; Monte Carlo Methods; TemporalDifference Learning; A Unified View; Eligibility Traces; Generalization and Function Approximation; Planning and Learning; Dimensions of Reinforcement Learning; Case Studies. Bertsekas, D. P. and Tsitsiklis, J. N. (1996), NeuroDynamic Programming, Belmont, MA: Athena Scientific, ISBN 1886529108. Author's Webpage: http://www.mit.edu:8001/people/dimitrib/home.html and http://web.mit.edu/jnt/www/home.html Book Webpage (Publisher):http://world.std.com/~athenasc/ndpbook.html Speech recognition  Bourlard, H.A., and Morgan, N. (1994), Connectionist Speech Recognition: A Hybrid Approach, Boston: Kluwer Academic Publishers, ISBN: 0792393961. From The Publisher: Describes the theory and implementation of a method to incorporate neural network approaches into stateoftheart continuous speech recognition systems based on Hidden Markov Models (HMMs) to improve their performance. In this framework, neural networks (and in particular, multilayer perceptrons or MLPs) have been restricted to welldefined subtasks of the whole system, i.e., HMM emission probability estimation and feature extraction. The book describes a successful five year international collaboration between the authors. The lessons learned form a case study that demonstrates how hybrid systems can be developed to combine neural networks with more traditional statistical approaches. The book illustrates both the advantages and limitations of neural networks in the framework of a statistical system. Using standard databases and comparing with some conventional approaches, it is shown that MLP probability estimation can improve recognition performance. Other approaches are discussed, though there is no such unequivocal experimental result for these methods. Connectionist Speech Recognition: A Hybrid Approach is of use to anyone intending to use neural networks for speech recognition or within the framework provided by an existing successful statistical approach. This includes research and development groups working in the field of speech recognition, both with standard and neural network approaches, as well as other pattern recognition and/or neural network researchers. This book is also suitable as a text for advanced courses on neural networks or speech processing. Statistics  Cherkassky, V., Friedman, J.H., and Wechsler, H., eds. (1991) From Statistics to Neural Networks: Theory and Pattern Recognition Applications, NY: Springer, ISBN 0387581995. Kay, J.W., and Titterington, D.M. (1999) Statistics and Neural Networks: Advances at the Interface, Oxford: Oxford University Press, ISBN 0198524226. Articles: Flexible Discriminant and Mixture Models; Neural Networks for Unsupervised Learning Based on Information Theory; Radial Basis Function Networks and Statistics; Robust Prediction in Manyparameter Models; Density Networks; Latent Variable Models and Data Visualisation; Analysis of Latent Structure Models with Multidimensional Latent Variables; Artificial Neural Networks and Multivariate Statistics. White, H. (1992b), Artificial Neural Networks: Approximation and Learning Theory, Blackwell, ISBN: 1557863296. Articles: There Exists a Neural Network That Does Not Make Avoidable Mistakes; Multilayer Feedforward Networks Are Universal Approximators; Universal Approximation Using Feedforward Networks with Nonsigmoid Hidden Layer Activation Functions; Approximating and Learning Unknown Mappings Using Multilayer Feedforward Networks with Bounded Weights; Universal Approximation of an Unknown Mapping and Its Derivatives; Neural Network Learning and Statistics; Learning in Artificial Neural Networks: a Statistical Perspective; Some Asymptotic Results for Learning in Single Hidden Layer Feedforward Networks; Connectionist Nonparametric Regression: Multilayer Feedforward Networks Can Learn Arbitrary Mappings; Nonparametric Estimation of Conditional Quantiles Using Neural Networks; On Learning the Derivatives of an Unknown Mapping with Multilayer Feedforward Networks; Consequences and Detection of Misspecified Nonlinear Regression Models; Maximum Likelihood Estimation of Misspecified Models; Some Results for Sieve Estimation with Dependent Observations. Timeseries forecasting  Weigend, A.S. and Gershenfeld, N.A., eds. (1994) Time Series Prediction: Forecasting the Future and Understanding the Past, Reading, MA: AddisonWesley, ISBN 0201626020. Book Webpage (Publisher): http://www2.awl.com/gb/abp/sfi/complexity.html Unsupervised learning  Kohonen, T. (1995/1997), SelfOrganizing Maps, 1st ed. 1995, 2nd ed. 1997, Berlin: SpringerVerlag, ISBN 3540620176. Deco, G. and Obradovic, D. (1996), An InformationTheoretic Approach to Neural Computing, NY: SpringerVerlag, ISBN 0387946667. Diamantaras, K.I., and Kung, S.Y. (1996) Principal Component Neural Networks: Theory and Applications, NY: Wiley, ISBN 0471054364. Van Hulle, M.M. (2000), Faithful Representations and Topographic Maps: From Distortion to InformationBased SelfOrganization, NY: Wiley, ISBN 0471345075. Books for the Beginner ++++++++++++++++++++++ Caudill, M. and Butler, C. (1990). Naturally Intelligent Systems. MIT Press: Cambridge, Massachusetts. (ISBN 0262031566). Book Webpage (Publisher): http://mitpress.mit.edu/bookhome.tcl?isbn=0262531135 The authors try to translate mathematical formulas into English. The results are likely to disturb people who appreciate either mathematics or English. Have the authors never heard that "a picture is worth a thousand words"? What few diagrams they have (such as the one on p. 74) tend to be confusing. Their jargon is peculiar even by NN standards; for example, they refer to target values as "mentor inputs" (p. 66). The authors do not understand elementary properties of error functions and optimization algorithms. For example, in their discussion of the delta rule, the authors seem oblivious to the differences between batch and online training, and they attribute magical properties to the algorithm (p. 71): [The online delta] rule always takes the most efficient route from the current position of the weight vector to the "ideal" position, based on the current input pattern. The delta rule not only minimizes the mean squared error, it does so in the most efficient fashion possiblequite an achievement for such a simple rule. While the authors realize that backpropagation networks can suffer from local minima, they mistakenly think that counterpropagation has some kind of global optimization ability (p. 202): Unlike the backpropagation network, a counterpropagation network cannot be fooled into finding a local minimum solution. This means that the network is guaranteed to find the correct response (or the nearest stored response) to an input, no matter what. But even though they acknowledge the problem of local minima, the authors are ignorant of the importance of initial weight values (p. 186): To teach our imaginary network something using backpropagation, we must start by setting all the adaptive weights on all the neurodes in it to random values. It won't matter what those values are, as long as they are not all the same and not equal to 1. Like most introductory books, this one neglects the difficulties of getting good generalizationthe authors simply declare (p. 8) that "A neural network is able to generalize"! Chester, M. (1993). Neural Networks: A Tutorial, Englewood Cliffs, NJ: PTR Prentice Hall. Book Webpage (Publisher): http://www.prenhall.com/ Additional Information: Seems to be out of print. Shallow, sometimes confused, especially with regard to Kohonen networks. Dayhoff, J. E. (1990). Neural Network Architectures: An Introduction. Van Nostrand Reinhold: New York. Comments from readers of comp.ai.neuralnets: "Like Wasserman's book, Dayhoff's book is also very easy to understand". Freeman, James (1994). Simulating Neural Networks with Mathematica, AddisonWesley, ISBN: 020156629X. Book Webpage (Publisher): http://cseng.aw.com/bookdetail.qry?ISBN=020156629X&ptype=0 Additional Information: Sourcecode available under: ftp://ftp.mathsource.com/pub/Publications/BookSupplements/Freeman1993 Helps the reader make his own NNs. The mathematica code for the programs in the book is also available through the internet: Send mail to MathSource@wri.com or try http://www.wri.com/ on the World Wide Web. Freeman, J.A. and Skapura, D.M. (1991). Neural Networks: Algorithms, Applications, and Programming Techniques, Reading, MA: AddisonWesley. Book Webpage (Publisher): http://www.awl.com/ Additional Information: Seems to be out of print. A good book for beginning programmers who want to learn how to write NN programs while avoiding any understanding of what NNs do or why they do it. Gately, E. (1996). Neural Networks for Financial Forecasting. New York: John Wiley and Sons, Inc. Book Webpage (Publisher): http://www.wiley.com/ Additional Information: One has to search. Franco Insana comments: * Decent book for the neural net beginner * Very little devoted to statistical framework, although there is some formulation of backprop theory * Some food for thought * Nothing here for those with any neural net experience McClelland, J. L. and Rumelhart, D. E. (1988). Explorations in Parallel Distributed Processing: Computational Models of Cognition and Perception (software manual). The MIT Press. Book Webpage (Publisher): http://mitpress.mit.edu/bookhome.tcl?isbn=026263113X (IBM version) and http://mitpress.mit.edu/bookhome.tcl?isbn=0262631296 (Macintosh) Comments from readers of comp.ai.neuralnets: "Written in a tutorial style, and includes 2 diskettes of NN simulation programs that can be compiled on MSDOS or Unix (and they do too !)"; "The programs are pretty reasonable as an introduction to some of the things that NNs can do."; "There are *two* editions of this book. One comes with disks for the IBM PC, the other comes with disks for the Macintosh". McCord Nelson, M. and Illingworth, W.T. (1990). A Practical Guide to Neural Nets. AddisonWesley Publishing Company, Inc. (ISBN 0201523760). Book Webpage (Publisher): http://cseng.aw.com/bookdetail.qry?ISBN=0201633787&ptype=1174 Lots of applications without technical details, lots of hype, lots of goofs, no formulas. Muller, B., Reinhardt, J., Strickland, M. T. (1995). Neural Networks.:An Introduction (2nd ed.). Berlin, Heidelberg, New York: SpringerVerlag. ISBN 3540602070. (DOS 3.5" disk included.) Book Webpage (Publisher): http://www.springer.de/catalog/htmlfiles/deutsch/phys/3540602070.html Comments from readers of comp.ai.neuralnets: "The book was developed out of a course on neuralnetwork models with computer demonstrations that was taught by the authors to Physics students. The book comes together with a PCdiskette. The book is divided into three parts: (1) Models of Neural Networks; describing several architectures and learing rules, including the mathematics. (2) Statistical Physics of Neural Networks; "hardcore" physics section developing formal theories of stochastic neural networks. (3) Computer Codes; explanation about the demonstration programs. First part gives a nice introduction into neural networks together with the formulas. Together with the demonstration programs a 'feel' for neural networks can be developed." Orchard, G.A. & Phillips, W.A. (1991). Neural Computation: A Beginner's Guide. Lawrence Earlbaum Associates: London. Comments from readers of comp.ai.neuralnets: "Short userfriendly introduction to the area, with a nontechnical flavour. Apparently accompanies a software package, but I haven't seen that yet". Rao, V.B, and Rao, H.V. (1993). C++ Neural Networks and Fuzzy Logic. MIS:Press, ISBN 155828298x, US $45 incl. disks. Covers a wider variety of networks than Masters (1993), but is shallow and lacks Masters's insight into practical issues of using NNs. Wasserman, P. D. (1989). Neural Computing: Theory & Practice. Van Nostrand Reinhold: New York. (ISBN 0442207433) This is not as bad as some books on NNs. It provides an elementary account of the mechanics of a variety of networks. But it provides no insight into why various methods behave as they do, or under what conditions a method will or will not work well. It has no discussion of efficient training methods such as RPROP or conventional numerical optimization techniques. And, most egregiously, it has no explanation of overfitting and generalization beyond the patently false statement on p. 2 that "It is important to note that the artificial neural network generalizes automatically as a result of its structure"! There is no mention of training, validation, and test sets, or of other methods for estimating generalization error. There is no practical advice on the important issue of choosing the number of hidden units. There is no discussion of early stopping or weight decay. The reader will come away from this book with a grossly oversimplified view of NNs and no concept whatsoever of how to use NNs for practical applications. Comments from readers of comp.ai.neuralnets: "Wasserman flatly enumerates some common architectures from an engineer's perspective ('how it works') without ever addressing the underlying fundamentals ('why it works')  important basic concepts such as clustering, principal components or gradient descent are not treated. It's also full of errors, and unhelpful diagrams drawn with what appears to be PCB board layout software from the '70s. For anyone who wants to do active research in the field I consider it quite inadequate"; "Okay, but too shallow"; "Quite easy to understand"; "The best bedtime reading for Neural Networks. I have given this book to numerous collegues who want to know NN basics, but who never plan to implement anything. An excellent book to give your manager." Notquitesointroductory Literature ++++++++++++++++++++++++++++++++++++ Kung, S.Y. (1993). Digital Neural Networks, Prentice Hall, Englewood Cliffs, NJ. Book Webpage (Publisher): http://www.prenhall.com/books/ptr_0136123260.html Levine, D. S. (2000). Introduction to Neural and Cognitive Modeling. 2nd ed., Lawrence Erlbaum: Hillsdale, N.J. Comments from readers of comp.ai.neuralnets: "Highly recommended". Maren, A., Harston, C. and Pap, R., (1990). Handbook of Neural Computing Applications. Academic Press. ISBN: 0124712606. (451 pages) Comments from readers of comp.ai.neuralnets: "They cover a broad area"; "Introductory with suggested applications implementation". Pao, Y. H. (1989). Adaptive Pattern Recognition and Neural Networks AddisonWesley Publishing Company, Inc. (ISBN 0201125846) Book Webpage (Publisher): http://www.awl.com/ Comments from readers of comp.ai.neuralnets: "An excellent book that ties together classical approaches to pattern recognition with Neural Nets. Most other NN books do not even mention conventional approaches." Refenes, A. (Ed.) (1995). Neural Networks in the Capital Markets. Chichester, England: John Wiley and Sons, Inc. Book Webpage (Publisher): http://www.wiley.com/ Additional Information: One has to search. Franco Insana comments: * Not for the beginner * Excellent introductory material presented by editor in first 5 chapters, which could be a valuable reference source for any practitioner * Very thoughtprovoking * Mostly backproprelated * Most contributors lay good statistical foundation * Overall, a wealth of information and ideas, but the reader has to sift through it all to come away with anything useful Simpson, P. K. (1990). Artificial Neural Systems: Foundations, Paradigms, Applications and Implementations. Pergamon Press: New York. Comments from readers of comp.ai.neuralnets: "Contains a very useful 37 page bibliography. A large number of paradigms are presented. On the negative side the book is very shallow. Best used as a complement to other books". Wasserman, P.D. (1993). Advanced Methods in Neural Computing. Van Nostrand Reinhold: New York (ISBN: 0442004613). Comments from readers of comp.ai.neuralnets: "Several neural network topics are discussed e.g. Probalistic Neural Networks, Backpropagation and beyond, neural control, Radial Basis Function Networks, Neural Engineering. Furthermore, several subjects related to neural networks are mentioned e.g. genetic algorithms, fuzzy logic, chaos. Just the functionality of these subjects is described; enough to get you started. Lots of references are given to more elaborate descriptions. Easy to read, no extensive mathematical background necessary." Zeidenberg. M. (1990). Neural Networks in Artificial Intelligence. Ellis Horwood, Ltd., Chichester. Comments from readers of comp.ai.neuralnets: "Gives the AI point of view". Zornetzer, S. F., Davis, J. L. and Lau, C. (1990). An Introduction to Neural and Electronic Networks. Academic Press. (ISBN 0127818812) Comments from readers of comp.ai.neuralnets: "Covers quite a broad range of topics (collection of articles/papers )."; "Provides a primerlike introduction and overview for a broad audience, and employs a strong interdisciplinary emphasis". Zurada, Jacek M. (1992). Introduction To Artificial Neural Systems. Hardcover, 785 Pages, 317 Figures, ISBN 053495460X, 1992, PWS Publishing Company, Price: $56.75 (includes shipping, handling, and the ANS software diskette). Solutions Manual available. Comments from readers of comp.ai.neuralnets: "Cohesive and comprehensive book on neural nets; as an engineeringoriented introduction, but also as a research foundation. Thorough exposition of fundamentals, theory and applications. Training and recall algorithms appear in boxes showing steps of algorithms, thus making programming of learning paradigms easy. Many illustrations and intuitive examples. Winner among NN textbooks at a senior UG/first year graduate level[175 problems]." Contents: Intro, Fundamentals of Learning, SingleLayer & Multilayer Perceptron NN, Assoc. Memories, Selforganizing and Matching Nets, Applications, Implementations, Appendix) Books with Source Code (C, C++) +++++++++++++++++++++++++++++++ Blum, Adam (1992), Neural Networks in C++, Wiley.  Review by Ian Cresswell. (For a review of the text, see "The Worst" below.) Mr Blum has not only contributed a masterpiece of NN inaccuracy but also seems to lack a fundamental understanding of Object Orientation. The excessive use of virtual methods (see page 32 for example), the inclusion of unnecessary 'friend' relationships (page 133) and a penchant for operator overloading (pick a page!) demonstrate inability in C++ and/or OO. The introduction to OO that is provided trivialises the area and demonstrates a distinct lack of direction and/or understanding. The public interfaces to classes are overspecified and the design relies upon the flawed neuron/layer/network model. There is a notable disregard for any notion of a robust class hierarchy which is demonstrated by an almost total lack of concern for inheritance and associated reuse strategies. The attempt to rationalise differing types of Neural Network into a single very shallow but wide class hierarchy is naive. The general use of the 'float' data type would cause serious hassle if this software could possibly be extended to use some of the more sensitive variants of backprop on more difficult problems. It is a matter of great fortune that such software is unlikely to be reusable and will therefore, like all good dinosaurs, disappear with the passage of time. The irony is that there is a card in the back of the book asking the unfortunate reader to part with a further $39.95 for a copy of the software (already included in print) on a 5.25" disk. The author claims that his work provides an 'Object Oriented Framework ...'. This can best be put in his own terms (Page 137): ... garble(float noise) ... Swingler, K. (1996), Applying Neural Networks: A Practical Guide, London:  Academic Press.  Review by Ian Cresswell. (For a review of the text, see "The Worst" below.) Before attempting to review the code associated with this book it should be clearly stated that it is supplied as an extraalmost as an afterthought. This may be a wise move. Although not as bad as other (even commercial) implementations, the code provided lacks proper OO structure and is typical of C++ written in a C style. Style criticisms include: 1. The use of public data fields within classes (loss of encapsulation). 2. Classes with no protected or private sections. 3. Little or no use of inheritance and/or runtime polymorphism. 4. Use of floats not doubles (a common mistake) to store values for connection weights. 5. Overuse of classes and public methods. The network class has 59 methods in its public section. 6. Lack of planning is evident for the construction of a class hierarchy. This code is without doubt written by a rushed C programmer. Whilst it would require a C++ compiler to be successfully used, it lacks the tight (optimised) nature of good C and the high level of abstraction of good C++. In a generous sense the code is free and the author doesn't claim any expertise in software engineering. It works in a limited sense but would be difficult to extend and/or reuse. It's fine for demonstration purposes in a standalone manner and for use with the book concerned. If you're serious about nets you'll end up rewriting the whole lot (or getting something better). The Worst +++++++++ How not to use neural nets in any programming language  Blum, Adam (1992), Neural Networks in C++, NY: Wiley. Welstead, Stephen T. (1994), Neural Network and Fuzzy Logic Applications in C/C++, NY: Wiley. (For a review of Blum's source code, see "Books with Source Code" above.) Both Blum and Welstead contribute to the dangerous myth that any idiot can use a neural net by dumping in whatever data are handy and letting it train for a few days. They both have little or no discussion of generalization, validation, and overfitting. Neither provides any valid advice on choosing the number of hidden nodes. If you have ever wondered where these stupid "rules of thumb" that pop up frequently come from, here's a source for one of them: "A rule of thumb is for the size of this [hidden] layer to be somewhere between the input layer size ... and the output layer size ..." Blum, p. 60. (John Lazzaro tells me he recently "reviewed a paper that cited this rule of thumband referenced this book! Needless to say, the final version of that paper didn't include the reference!") Blum offers some profound advice on choosing inputs: "The next step is to pick as many input factors as possible that might be related to [the target]." Blum also shows a deep understanding of statistics: "A statistical model is simply a more indirect way of learning correlations. With a neural net approach, we model the problem directly." p. 8. Blum at least mentions some important issues, however simplistic his advice may be. Welstead just ignores them. What Welstead gives you is codevast amounts of code. I have no idea how anyone could write that much code for a simple feedforward NN. Welstead's approach to validation, in his chapter on financial forecasting, is to reserve two cases for the validation set! My comments apply only to the text of the above books. I have not examined or attempted to compile the code. An impractical guide to neural nets  Swingler, K. (1996), Applying Neural Networks: A Practical Guide, London: Academic Press. (For a review of the source code, see "Books with Source Code" above.) This book has lots of good advice liberally sprinkled with errors, incorrect formulas, some bad advice, and some very serious mistakes. Experts will learn nothing, while beginners will be unable to separate the useful information from the dangerous. For example, there is a chapter on "Data encoding and recoding" that would be very useful to beginners if it were accurate, but the formula for the standard deviation is wrong, and the description of the softmax function is of something entirely different than softmax (see What is a softmax activation function?). Even more dangerous is the statement on p. 28 that "Any pair of variables with high covariance are dependent, and one may be chosen to be discarded." Although high correlations can be used to identify redundant inputs, it is incorrect to use high covariances for this purpose, since a covariance can be high simply because one of the inputs has a high standard deviation. The most ludicrous thing I've found in the book is the claim that HechtNeilsen used Kolmogorov's theorem to show that "you will never require more than twice the number of hidden units as you have inputs" (p. 53) in an MLP with one hidden layer. Actually, HechtNeilsen, says "the direct usefulness of this result is doubtful, because no constructive method for developing the [output activation] functions is known." Then Swingler implies that V. Kurkova (1991, "Kolmogorov's theorem is relevant," Neural Computation, 3, 617622) confirmed this alleged upper bound on the number of hidden units, saying that, "Kurkova was able to restate Kolmogorov's theorem in terms of a set of sigmoidal functions." If Kolmogorov's theorem, or HechtNielsen's adaptation of it, could be restated in terms of known sigmoid activation functions in the (single) hidden and output layers, then Swingler's alleged upper bound would be correct, but in fact no such restatement of Kolmogorov's theorem is possible, and Kurkova did not claim to prove any such restatement. Swingler omits the crucial details that Kurkova used two hidden layers, staircaselike activation functions (not ordinary sigmoidal functions such as the logistic) in the first hidden layer, and a potentially large number of units in the second hidden layer. Kurkova later estimated the number of units required for uniform approximation within an error epsilon as nm(m+1) in the first hidden layer and m^2(m+1)^n in the second hidden layer, where n is the number of inputs and m "depends on epsilon/f as well as on the rate with which f increases distances." In other words, Kurkova says nothing to support Swinglers advice (repeated on p. 55), "Never choose h to be more than twice the number of input units." Furthermore, constructing a counter example to Swingler's advice is trivial: use one input and one output, where the output is the sine of the input, and the domain of the input extends over many cycles of the sine wave; it is obvious that many more than two hidden units are required. For some sound information on choosing the number of hidden units, see How many hidden units should I use? Choosing the number of hidden units is one important aspect of getting good generalization, which is the most crucial issue in neural network training. There are many other considerations involved in getting good generalization, and Swingler makes several more mistakes in this area: o There is dangerous misinformation on p. 55, where Swingler says, "If a data set contains no noise, then there is no risk of overfitting as there is nothing to overfit." It is true that overfitting is more common with noisy data, but severe overfitting can occur with noisefree data, even when there are more training cases than weights. There is an example of such overfitting under How many hidden layers should I use? o Regarding the use of added noise (jitter) in training, Swingler says on p. 60, "The more noise you add, the more general your model becomes." This statement makes no sense as it stands (it would make more sense if "general" were changed to "smooth"), but it could certainly encourage a beginner to use far too much jittersee What is jitter? (Training with noise). o On p. 109, Swingler describes leaveoneout crossvalidation, which he ascribes to HechtNeilsen. But Swingler concludes, "the method provides you with L minus 1 networks to choose from; none of which has been validated properly," completely missing the point that crossvalidation provides an estimate of the generalization error of a network trained on the entire training set of L casessee What are crossvalidation and bootstrapping? Also, there are L leaveoneout networks, not L1. While Swingler has some knowldege of statistics, his expertise is not sufficient for him to detect that certain articles on neural nets are statistically nonsense. For example, on pp. 139140 he uncritically reports a method that allegedly obtains error bars by doing a simple linear regression on the target vs. output scores. To a trained statistician, this method is obviously wrong (and, as usual in this book, the formula for variance given for this method on p. 150 is wrong). On p. 110, Swingler reports an article that attempts to apply bootstrapping to neural nets, but this article is also obviously wrong to anyone familiar with bootstrapping. While Swingler cannot be blamed entirely for accepting these articles at face value, such misinformation provides yet more hazards for beginners. Swingler addresses many important practical issues, and often provides good practical advice. But the peculiar combination of much good advice with some extremely bad advice, a few examples of which are provided above, could easily seduce a beginner into thinking that the book as a whole is reliable. It is this danger that earns the book a place in "The Worst" list. Bad science writing  Dewdney, A.K. (1997), Yes, We Have No Neutrons: An EyeOpening Tour through the Twists and Turns of Bad Science, NY: Wiley. This book, allegedly an expose of bad science, contains only one chapter of 19 pages on "the neural net debacle" (p. 97). Yet this chapter is so egregiously misleading that the book has earned a place on "The Worst" list. A detailed criticism of this chapter, along with some other sections of the book, can be found at ftp://ftp.sas.com/pub/neural/badscience.html. Other chapters of the book are reviewed in the November, 1997, issue of Scientific American. User Contributions:1 Andy Apr 24, 2015 @ 7:19 pm Why is it generally a good idea to omit the biases from the penalty term for weight decay? Comment about this article, ask questions, or add new information about this topic:Top Document: comp.ai.neuralnets FAQ, Part 4 of 7: Books, data, etc. Previous Document: News Headers Next Document: Journals and magazines about Neural Networks? Part1  Part2  Part3  Part4  Part5  Part6  Part7  Single Page [ Usenet FAQs  Web FAQs  Documents  RFC Index ] Send corrections/additions to the FAQ Maintainer: saswss@unx.sas.com (Warren Sarle)
Last Update March 27 2014 @ 02:11 PM
