Search the FAQ Archives

3 - A - B - C - D - E - F - G - H - I - J - K - L - M
N - O - P - Q - R - S - T - U - V - W - X - Y - Z
faqs.org - Internet FAQ Archives

comp.ai.neural-nets FAQ, Part 2 of 7: Learning
Section - What are OLS and subset/stepwise regression?

( Part1 - Part2 - Part3 - Part4 - Part5 - Part6 - Part7 - Single Page )
[ Usenet FAQs | Web FAQs | Documents | RFC Index | Airports ]


Top Document: comp.ai.neural-nets FAQ, Part 2 of 7: Learning
Previous Document: How do MLPs compare with RBFs?
Next Document: Should I normalize/standardize/rescale the
See reader questions & answers on this topic! - Help others by sharing your knowledge

If you are statistician, "OLS" means "ordinary least squares" (as opposed to
weighted or generalized least squares), which is what the NN literature
often calls "LMS" (least mean squares). 

If you are a neural networker, "OLS" means "orthogonal least squares", which
is an algorithm for forward stepwise regression proposed by Chen et al.
(1991) for training RBF networks. 

OLS is a variety of supervised training. But whereas backprop and other
commonly-used supervised methods are forms of continuous optimization, OLS
is a form of combinatorial optimization. Rather than treating the RBF
centers as continuous values to be adjusted to reduce the training error,
OLS starts with a large set of candidate centers and selects a subset that
usually provides good training error. For small training sets, the
candidates can include all of the training cases. For large training sets,
it is more efficient to use a random subset of the training cases or to do a
cluster analysis and use the cluster means as candidates. 

Each center corresponds to a predictor variable in a linear regression
model. The values of these predictor variables are computed from the RBF
applied to each center. There are numerous methods for selecting a subset of
predictor variables in regression (Myers 1986; Miller 1990). The ones most
often used are: 

 o Forward selection begins with no centers in the network. At each step the
   center is added that most decreases the objective function. 
 o Backward elimination begins with all candidate centers in the network. At
   each step the center is removed that least increases the objective
   function. 
 o Stepwise selection begins like forward selection with no centers in the
   network. At each step, a center is added or removed. If there are any
   centers in the network, the one that contributes least to reducing the
   objective function is subjected to a statistical test (usually based on
   the F statistic) to see if it is worth retaining in the network; if the
   center fails the test, it is removed. If no centers are removed, then the
   centers that are not currently in the network are examined; the one that
   would contribute most to reducing the objective function is subjected to
   a statistical test to see if it is worth adding to the network; if the
   center passes the test, it is added. When all centers in the network pass
   the test for staying in the network, and all other centers fail the test
   for being added to the network, the stepwise method terminates. 
 o Leaps and bounds (Furnival and Wilson 1974) is an algorithm for
   determining the subset of centers that minimizes the objective function;
   this optimal subset can be found without examining all possible subsets,
   but the algorithm is practical only up to 30 to 50 candidate centers. 

OLS is a particular algorithm for forward selection using modified
Gram-Schmidt (MGS) orthogonalization. While MGS is not a bad algorithm, it
is not the best algorithm for linear least-squares (Lawson and Hanson 1974).
For ill-conditioned data (see 
ftp://ftp.sas.com/pub/neural/illcond/illcond.html), Householder and Givens
methods are generally preferred, while for large, well-conditioned data
sets, methods based on the normal equations require about one-third as many
floating point operations and much less disk I/O than OLS. Normal equation
methods based on sweeping (Goodnight 1979) or Gaussian elimination (Furnival
and Wilson 1974) are especially simple to program. 

While the theory of linear models is the most thoroughly developed area of
statistical inference, subset selection invalidates most of the standard
theory (Miller 1990; Roecker 1991; Derksen and Keselman 1992; Freedman, Pee,
and Midthune 1992). 

Subset selection methods usually do not generalize as well as regularization
methods in linear models (Frank and Friedman 1993). Orr (1995) has proposed
combining regularization with subset selection for RBF training (see also
Orr 1996). 

References: 

   Chen, S., Cowan, C.F.N., and Grant, P.M. (1991), "Orthogonal least
   squares learning for radial basis function networks," IEEE Transactions
   on Neural Networks, 2, 302-309. 

   Derksen, S. and Keselman, H. J. (1992) "Backward, forward and stepwise
   automated subset selection algorithms: Frequency of obtaining authentic
   and noise variables," British Journal of Mathematical and Statistical
   Psychology, 45, 265-282, 

   Frank, I.E. and Friedman, J.H. (1993) "A statistical view of some
   chemometrics regression tools," Technometrics, 35, 109-148. 

   Freedman, L.S. , Pee, D. and Midthune, D.N. (1992) "The problem of
   underestimating the residual error variance in forward stepwise
   regression", The Statistician, 41, 405-412. 

   Furnival, G.M. and Wilson, R.W. (1974), "Regression by Leaps and Bounds,"
   Technometrics, 16, 499-511. 

   Goodnight, J.H. (1979), "A Tutorial on the SWEEP Operator," The American
   Statistician, 33, 149-158. 

   Lawson, C. L. and Hanson, R. J. (1974), Solving Least Squares Problems,
   Englewood Cliffs, NJ: Prentice-Hall, Inc. (2nd edition: 1995,
   Philadelphia: SIAM) 

   Miller, A.J. (1990), Subset Selection in Regression, Chapman & Hall. 

   Myers, R.H. (1986), Classical and Modern Regression with Applications,
   Boston: Duxbury Press. 

   Orr, M.J.L. (1995), "Regularisation in the selection of radial basis
   function centres," Neural Computation, 7, 606-623. 

   Orr, M.J.L. (1996), "Introduction to radial basis function networks,"
   http://www.cns.ed.ac.uk/people/mark/intro.ps or
   http://www.cns.ed.ac.uk/people/mark/intro/intro.html . 

   Roecker, E.B. (1991) "Prediction error and its estimation for
   subset-selected models," Technometrics, 33, 459-468. 

User Contributions:

Comment about this article, ask questions, or add new information about this topic:

CAPTCHA




Top Document: comp.ai.neural-nets FAQ, Part 2 of 7: Learning
Previous Document: How do MLPs compare with RBFs?
Next Document: Should I normalize/standardize/rescale the

Part1 - Part2 - Part3 - Part4 - Part5 - Part6 - Part7 - Single Page

[ Usenet FAQs | Web FAQs | Documents | RFC Index ]

Send corrections/additions to the FAQ Maintainer:
saswss@unx.sas.com (Warren Sarle)





Last Update March 27 2014 @ 02:11 PM