comp.ai.neural-nets FAQ, Part 7 of 7: HardwareSection - How to forecast time series (temporal sequences)?

Top Document: comp.ai.neural-nets FAQ, Part 7 of 7: Hardware
Previous Document: What to do with missing/incomplete data?
Next Document: How to learn an inverse of a function?

See reader questions & answers on this topic! - Help others by sharing your knowledge


In most of this FAQ, it is assumed that the training cases are statistically
independent. That is, the training cases consist of pairs of input and
target vectors, (X_i,Y_i), i=1,...,N, such that the conditional
distribution of Y_i given all the other training data, (X_j,
j=1,...,N, and Y_j, j=1,...i-1,i+1,...N) is equal to the
conditional distribution of Y_i given X_i regardless of the values in the
other training cases. Independence of cases is often achieved by random
sampling. 

The most common violation of the independence assumption occurs when cases
are observed in a certain order relating to time or space. That is, case 
(X_i,Y_i) corresponds to time T_i, with T_1 < T_2 < ... <
T_N. It is assumed that the current target Y_i may depend not only on 
X_i but also on (X_i,Y_i) in the recent past. If the T_i are equally
spaced, the simplest way to deal with this dependence is to include
additional inputs (called lagged variables, shift registers, or a tapped
delay line) in the network. Thus, for target Y_i, the inputs may include 
X_i, Y_{i-1}, X_{i-1}, Y_{i-1}, X_{i-2}, etc. (In some
situations, X_i would not be known at the time you are trying to forecast 
Y_i and would therefore be excluded from the inputs.) Then you can train
an ordinary feedforward network with these targets and lagged variables. The
use of lagged variables has been extensively studied in the statistical and
econometric literature (Judge, Griffiths, Hill, Lütkepohl and Lee, 1985). A
network in which the only inputs are lagged target values is called an
"autoregressive model." The input space that includes all of the lagged
variables is called the "embedding space." 

If the T_i are not equally spaced, everything gets much more complicated.
One approach is to use a smoothing technique to interpolate points at
equally spaced intervals, and then use the interpolated values for training
instead of the original data. 

Use of lagged variables increases the number of decisions that must be made
during training, since you must consider which lags to include in the
network, as well as which input variables, how many hidden units, etc.
Neural network researchers have therefore attempted to use partially
recurrent networks instead of feedforward networks with lags (Weigend and
Gershenfeld, 1994). Recurrent networks store information about past values
in the network itself. There are many different kinds of recurrent
architectures (Hertz, Krogh, and Palmer 1991; Mozer, 1994; Horne and Giles,
1995; Kremer, 199?). For example, in time-delay neural networks (Lang,
Waibel, and Hinton 1990), the outputs for predicting target Y_{i-1} are
used as inputs when processing target Y_i. Jordan networks (Jordan, 1986)
are similar to time-delay neural networks except that the feedback is an
exponential smooth of the sequence of output values. In Elman networks
(Elman, 1990), the hidden unit activations that occur when processing target
Y_{i-1} are used as inputs when processing target Y_i. 

However, there are some problems that cannot be dealt with via recurrent
networks alone. For example, many time series exhibit trend, meaning that
the target values tend to go up over time, or that the target values tend to
go down over time. For example, stock prices and many other financial
variables usually go up. If today's price is higher than all previous
prices, and you try to forecast tomorrow's price using today's price as a
lagged input, you are extrapolating, and extrapolating is unreliable. The
simplest methods for handling trend are: 

 o First fit a linear regression predicting the target values from the time,
   Y_i = a + b T_i + noise, where a and b are regression
   weights. Compute residuals R_i = Y_i - (a + b T_i). Then
   train the network using R_i for the target and lagged values. This
   method is rather crude but may work for deterministic linear trends. Of
   course, for nonlinear trends, you would need to fit a nonlinear
   regression. 

 o Instead of using Y_i as a target, use D_i = Y_i - Y_{i-1} for
   the target and lagged values. This is called differencing and is the
   standard statistical method for handling nondeterministic (stochastic)
   trends. Sometimes it is necessary to compute differences of differences. 

For an elementary discussion of trend and various other practical problems
in forecasting time series with NNs, such as seasonality, see Masters
(1993). For a more advanced discussion of NN forecasting of economic series,
see Moody (1998). 

There are several different ways to compute forecasts. For simplicity, let's
assume you have a simple time series, Y_1, ..., Y_99, you want to
forecast future values Y_f for f > 99, and you decide to use three
lagged values as inputs. The possibilities include: 

Single-step, one-step-ahead, or open-loop forecasting: 
   Train a network with target Y_i and inputs Y_{i-1}, Y_{i-2},
   and Y_{i-3}. Let the scalar function computed by the network be
   designated as Net(.,.,.) taking the three input values as arguments
   and returning the output (predicted) value. Then:
   forecast Y_100 as Net(Y_99,Y_98,Y_97)
   forecast Y_101 as Net(Y_100,Y_99,Y_98)
   forecast Y_102 as Net(Y_101,Y_100,Y_99)
   forecast Y_103 as Net(Y_102,Y_101,Y_100)
   forecast Y_104 as Net(Y_103,Y_102,Y_101)
   and so on. 

Multi-step or closed-loop forecasting: 
   Train the network as above, but:
   forecast Y_100 as P_100 = Net(Y_99,Y_98,Y_97)
   forecast Y_101 as P_101 = Net(P_100,Y_99,Y_98)
   forecast Y_102 as P_102 = Net(P_101,P_100,Y_99)
   forecast Y_103 as P_103 = Net(P_102,P_101,P_100)
   forecast Y_104 as P_104 = Net(P_103,P_102,P_101)
   and so on. 

N-step-ahead forecasting: 
   For, say, N=3, train the network as above, but:
   compute P_100 = Net(Y_99,Y_98,Y_97)
   compute P_101 = Net(P_100,Y_99,Y_98)
   forecast Y_102 as P_102 = Net(P_101,P_100,Y_99)
   forecast Y_103 as P_103 = Net(P_102,P_101,Y_100)
   forecast Y_104 as P_104 = Net(P_103,P_102,Y_101)
   and so on. 

Direct simultaneous long-term forecasting: 
   Train a network with multiple targets Y_i, Y_{i+1}, and Y_{i+2}
   and inputs Y_{i-1}, Y_{i-2}, and Y_{i-3}. Let the vector
   function computed by the network be designated as Net3(.,.,.),
   taking the three input values as arguments and returning the output
   (predicted) vector. Then:
   forecast (Y_100,Y_101,Y_102) as Net3(Y_99,Y_98,Y_97)

Which method you choose for computing forecasts will obviously depend in
part on the requirements of your application. If you have yearly sales
figures through 1999 and you need to forecast sales in 2003, you clearly
can't use single-step forecasting. If you need to compute forecasts at a
thousand different future times, using direct simultaneous long-term
forecasting would require an extremely large network. 

If a time series is a random walk, a well-trained network will predict Y_i
by simply outputting Y_{i-1}. If you make a plot showing both the target
values and the outputs, the two curves will almost coincide, except for
being offset by one time step. People often mistakenly intrepret such a plot
to indicate good forecasting accuracy, whereas in fact the network is
virtually useless. In such situations, it is more enlightening to plot
multi-step forecasts or N-step-ahead forecasts. 

For general information on time-series forecasting, see the following URLs: 

 o Forecasting FAQs: http://forecasting.cwru.edu/faqs.html 
 o Forecasting Principles: http://hops.wharton.upenn.edu/forecast/ 
 o Investment forecasts for stocks and mutual funds: 
   http://www.coe.uncc.edu/~hphillip/ 

References: 

   Elman, J.L. (1990), "Finding structure in time," Cognitive Science, 14,
   179-211. 

   Hertz, J., Krogh, A., and Palmer, R. (1991). Introduction to the Theory of
   Neural Computation. Addison-Wesley: Redwood City, California. 

   Horne, B. G. and Giles, C. L. (1995), "An experimental comparison of
   recurrent neural networks," In Tesauro, G., Touretzky, D., and Leen, T.,
   editors, Advances in Neural Information Processing Systems 7, pp.
   697-704. The MIT Press. 

   Jordan, M. I. (1986), "Attractor dynamics and parallelism in a
   connectionist sequential machine," In Proceedings of the Eighth Annual
   conference of the Cognitive Science Society, pages 531-546. Lawrence
   Erlbaum. 

   Judge, G.G., Griffiths, W.E., Hill, R.C., Lütkepohl, H., and Lee, T.-C.
   (1985), The Theory and Practice of Econometrics, NY: John Wiley & Sons. 

   Kremer, S.C. (199?), "Spatio-temporal Connectionist Networks: A Taxonomy
   and Review," 
   http://hebb.cis.uoguelph.ca/~skremer/Teaching/27642/dynamic2/review.html.

   Lang, K. J., Waibel, A. H., and Hinton, G. (1990), "A time-delay neural
   network architecture for isolated word recognition," Neural Networks, 3,
   23-44. 

   Masters, T. (1993). Practical Neural Network Recipes in C++, San Diego:
   Academic Press. 

   Moody, J. (1998), "Forecasting the economy with neural nets: A survey of
   challenges and solutions," in Orr, G,B., and Mueller, K-R, eds., Neural
   Networks: Tricks of the Trade, Berlin: Springer. 

   Mozer, M.C. (1994), "Neural net architectures for temporal sequence
   processing," in Weigend, A.S. and Gershenfeld, N.A., eds. (1994) Time
   Series Prediction: Forecasting the Future and Understanding the Past,
   Reading, MA: Addison-Wesley, 243-264, 
   http://www.cs.colorado.edu/~mozer/papers/timeseries.html. 

   Weigend, A.S. and Gershenfeld, N.A., eds. (1994) Time Series Prediction:
   Forecasting the Future and Understanding the Past, Reading, MA:
   Addison-Wesley.

User Contributions:

Comment about this article, ask questions, or add new information about this topic:

Archived related questions and answers

Top Document: comp.ai.neural-nets FAQ, Part 7 of 7: Hardware
Previous Document: What to do with missing/incomplete data?
Next Document: How to learn an inverse of a function?

Part1 - Part2 - Part3 - Part4 - Part5 - Part6 - Part7 - Single Page

[ Usenet FAQs | Web FAQs | Documents | RFC Index ]

Send corrections/additions to the FAQ Maintainer:
saswss@unx.sas.com (Warren Sarle)

Last Update March 27 2014 @ 02:11 PM

comp.ai.neural-nets FAQ, Part 7 of 7: Hardware
Section - How to forecast time series (temporal sequences)?

Search the FAQ Archives

comp.ai.neural-nets FAQ, Part 7 of 7: Hardware
Section - How to forecast time series (temporal sequences)?

User Contributions:

Comment about this article, ask questions, or add new information about this topic: