Top Document: comp.ai.neuralnets FAQ, Part 2 of 7: Learning Previous Document: What is a softmax activation function? Next Document: How do MLPs compare with RBFs? See reader questions & answers on this topic!  Help others by sharing your knowledge Answer by Janne Sinkkonen. Curse of dimensionality (Bellman 1961) refers to the exponential growth of hypervolume as a function of dimensionality. In the field of NNs, curse of dimensionality expresses itself in two related problems: 1. Many NNs can be thought of mappings from an input space to an output space. Thus, loosely speaking, an NN needs to somehow "monitor", cover or represent every part of its input space in order to know how that part of the space should be mapped. Covering the input space takes resources, and, in the most general case, the amount of resources needed is proportional to the hypervolume of the input space. The exact formulation of "resources" and "part of the input space" depends on the type of the network and should probably be based on the concepts of information theory and differential geometry. As an example, think of a vector quantization (VQ). In VQ, a set of units competitively learns to represents an input space (this is like Kohonen's SelfOrganizing Map but without topography for the units). Imagine a VQ trying to share its units (resources) more or less equally over hyperspherical input space. One could argue that the average distance from a random point of the space to the nearest network unit measures the goodness of the representation: the shorter the distance, the better is the represention of the data in the sphere. It is intuitively clear (and can be experimentally verified) that the total number of units required to keep the average distance constant increases exponentially with the dimensionality of the sphere (if the radius of the sphere is fixed). The curse of dimensionality causes networks with lots of irrelevant inputs to be behave relatively badly: the dimension of the input space is high, and the network uses almost all its resources to represent irrelevant portions of the space. Unsupervised learning algorithms are typically prone to this problem  as well as conventional RBFs. A partial remedy is to preprocess the input in the right way, for example by scaling the components according to their "importance". 2. Even if we have a network algorithm which is able to focus on important portions of the input space, the higher the dimensionality of the input space, the more data may be needed to find out what is important and what is not. A priori information can help with the curse of dimensionality. Careful feature selection and scaling of the inputs fundamentally affects the severity of the problem, as well as the selection of the neural network model. For classification purposes, only the borders of the classes are important to represent accurately. References: Bellman, R. (1961), Adaptive Control Processes: A Guided Tour, Princeton University Press. Bishop, C.M. (1995), Neural Networks for Pattern Recognition, Oxford: Oxford University Press, section 1.4. Scott, D.W. (1992), Multivariate Density Estimation, NY: Wiley. User Contributions:Comment about this article, ask questions, or add new information about this topic:Top Document: comp.ai.neuralnets FAQ, Part 2 of 7: Learning Previous Document: What is a softmax activation function? Next Document: How do MLPs compare with RBFs? Part1  Part2  Part3  Part4  Part5  Part6  Part7  Single Page [ Usenet FAQs  Web FAQs  Documents  RFC Index ] Send corrections/additions to the FAQ Maintainer: saswss@unx.sas.com (Warren Sarle)
Last Update March 27 2014 @ 02:11 PM
