You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format. However, this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the Department of Computer Science of the University of Maryland at College Park under terms that include this permission. All other rights are reserved by the author(s).
Alternative Discrete-Time Operators and Their Application to Nonlinear. Andrew D. Back. Ah Chung Tsoi. Bill G. Horne. C. Lee Giles. January 1997.
The shift operator, defined as q x(t) = x(t+1), is the basis for almost all discrete-time models. It has been shown however, that linear models based on the shift operator suffer problems when used to model lightly-damped-low-frequency (LDLF) systems, with poles near $(1,0)$ on the unit circle in the complex plane. This problem occurs under fast sampling conditions. As the sampling rate increases, coefficient sensitivity and round-off noise become a problem as the difference between successive sampled inputs becomes smaller and smaller. The resulting coefficients of the model approach the coefficients obtained in a binomial expansion, regardless of the underlying continuous-time system. This implies that for a given finite wordlength, severe inaccuracies may result. Wordlengths for the coefficients may also need to be made longer to accommodate models which have low frequency characteristics, corresponding to poles in the neighbourhood of (1,0). These problems also arise in neural network models which comprise of linear parts and nonlinear neural activation functions. Various alternative discrete-time operators can be introduced which offer numerical computational advantages over the conventional shift operator. The alternative discrete-time operators have been proposed independently of each other in the fields of digital filtering, adaptive control and neural networks. These include the delta, rho, gamma and bilinear operators. In this paper we first review these operators and examine some of their properties. An analysis of the TDNN and FIR MLP network structures is given which shows their susceptibility to parameter sensitivity problems. Subsequently, it is shown that models may be formulated using alternative discrete-time operators which have low sensitivity properties. Consideration is given to the problem of finding parameters for stable alternative discrete-time operators. A learning algorithm which adapts the alternative discrete-time operators parameters on-line is presented for MLP neural network models based on alternative discrete-time operators. It is shown that neural network models which use these alternative discrete-time perform better than those using the shift operator alone. (Also cross-referenced as UMIACS-TR-97-03) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Laboratory for Artificial Brain Systems, Institute of Physical and, Faculty of Informatics, University of Wollongong, Australia, AADM Consulting, Califon, NJ, NEC Research Institute, Princeton, NJ,
Noisy Time Series Prediction using Symbolic Representation and. Steve Lawrence. Ah Chung Tsoi. C. Lee Giles. April 1996.
Financial forecasting is an example of a signal processing problem which is challenging due to small sample sizes, high noise, non-stationarity, and non-linearity. Neural networks have been very successful in a number of signal processing applications. We discuss fundamental limitations and inherent difficulties when using neural networks for the processing of high noise, small sample size signals. We introduce a new intelligent signal processing method which addresses the difficulties. The method uses conversion into a symbolic representation with a self-organizing map, and grammatical inference with recurrent neural networks. We apply the method to the prediction of daily foreign exchange rates, addressing difficulties with non-stationarity, overfitting, and unequal a priori class probabilities, and we find significant predictability in comprehensive experiments covering 5 different foreign exchange rates. The method correctly predicts the direction of change for the next day with an error rate of 47.1%. The error rate reduces to around 40% when rejecting examples where the system has low confidence in its prediction. The symbolic representation aids the extraction of symbolic knowledge from the recurrent neural networks in the form of deterministic finite state automata. These automata explain the operation of the system and are often relatively simple. Rules related to well known behavior such as trend following and mean reversal are extracted. Also cross-referenced as UMIACS-TR-96-27 University if Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
What Size Neural Network Gives Optimal Generalization? Convergence Properties of Backpropagation. Steve Lawrence. C. Lee Giles. Ah Chung Tsoi. April 1996.
One of the most important aspects of any machine learning paradigm is how it scales according to problem size and complexity. Using a task with known optimal training error, and a pre-specified maximum number of training updates, we investigate the convergence of the backpropagation algorithm with respect to a) the complexity of the required function approximation, b) the size of the network in relation to the size required for an optimal solution, and c) the degree of noise in the training data. In general, for a) the solution found is worse when the function to be approximated is more complex, for b) oversize networks can result in lower training and generalization error, and for c) the use of committee or ensemble techniques can be more beneficial as the amount of noise in the training data is increased. For the experiments we performed, we do not obtain the optimal solution in any case. We further support the observation that larger networks can produce better training and generalization error using a face recognition example where a network with many more parameters than training points generalizes better than smaller networks. (Also cross-referenced as UMIACS-TR-96-22) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Face Recognition: A Hybrid Neural Network Approach. Steve Lawrence. C. Lee Giles. Ah Chung Tsoi. Andrew D. Back. April 1996.
Faces represent complex, multidimensional, meaningful visual stimuli and developing a computational model for face recognition is difficult. We present a hybrid neural network solution which compares favorably with other methods. The system combines local image sampling, a self-organizing map neural network, and a convolutional neural network. The self-organizing map provides a quantization of the image samples into a topological space where inputs that are nearby in the original space are also nearby in the output space, thereby providing dimensionality reduction and invariance to minor changes in the image sample, and the convolutional neural network provides for partial invariance to translation, rotation, scale, and deformation. The convolutional network extracts successively larger features in a hierarchical set of layers. We present results using the Karhunen-Loeve transform in place of the self-organizing map, and a multi-layer perceptron in place of the convolutional network. The Karhunen-Loeve transform performs almost as well (5.3% error versus 3.8%). The multi-layer perceptron performs very poorly (40% error versus 3.8%). The method is capable of rapid classification, requires only fast, approximate normalization and preprocessing, and consistently exhibits better classification performance than the eigenfaces approach on the database considered as the number of images per person in the training database is varied from 1 to 5. With 5 images per person the proposed method and eigenfaces result in 3.8 and 10.5 error respectively. The recognizer provides a measure of confidence in its output and classification error approaches zero when rejecting as few as 10 of the examples. We use a database of 400 images of 40 individuals which contains quite a high degree of variability in expression, pose, and facial details. We analyze computational complexity and discuss how new classes could be added to the trained recognizer. (Also cross-referenced as UMIACS-TR-96-16) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Last Generated Fri Aug 11 04:01:01 EDT 2000