Echo state network explained

An echo state network (ESN)[1] [2] is a type of reservoir computer that uses a recurrent neural network with a sparsely connected hidden layer (with typically 1% connectivity). The connectivity and weights of hidden neurons are fixed and randomly assigned. The weights of output neurons can be learned so that the network can produce or reproduce specific temporal patterns. The main interest of this network is that although its behavior is non-linear, the only weights that are modified during training are for the synapses that connect the hidden neurons to output neurons. Thus, the error function is quadratic with respect to the parameter vector and can be differentiated easily to a linear system.

Alternatively, one may consider a nonparametric Bayesian formulation of the output layer, under which: (i) a prior distribution is imposed over the output weights; and (ii) the output weights are marginalized out in the context of prediction generation, given the training data. This idea has been demonstrated in[3] by using Gaussian priors, whereby a Gaussian process model with ESN-driven kernel function is obtained. Such a solution was shown to outperform ESNs with trainable (finite) sets of weights in several benchmarks.

Some publicly available efficient implementations of ESNs are aureservoir (a C++ library for various kinds with python/numpy bindings), MATLAB, ReservoirComputing.jl (a Julia-based implementation of various types) and pyESN (for simple ESNs in Python).

Background

The Echo State Network (ESN)[4] belongs to the Recurrent Neural Network (RNN) family and provide their architecture and supervised learning principle. Unlike Feedforward Neural Networks, Recurrent Neural Networks are dynamic systems and not functions. Recurrent Neural Networks are typically used for:

For the training of RNNs a number of learning algorithms are available: backpropagation through time, real-time recurrent learning. Convergence is not guaranteed due to instability and bifurcation phenomena.  

The main approach of the ESN is firstly to operate a random, large, fixed, recurring neural network with the input signal, which induces a nonlinear response signal in each neuron within this "reservoir" network, and secondly connect a desired output signal by a trainable linear combination of all these response signals.

Another feature of the ESN is the autonomous operation in prediction: if it is trained with an input that is a backshifted version of the output, then it can be used for signal generation/prediction by using the previous output as input.

The main idea of ESNs is tied to liquid state machines, which were independently and simultaneously developed with ESNs by Wolfgang Maass.[6] They, ESNs and the newly researched backpropagation decorrelation learning rule for RNNs[7] are more and more summarized under the name Reservoir Computing.

Schiller and Steil also demonstrated that in conventional training approaches for RNNs, in which all weights (not only output weights) are adapted, the dominant changes are in output weights. In cognitive neuroscience, Peter F. Dominey analysed a related process related to the modelling of sequence processing in the mammalian brain, in particular speech recognition in the human brain.[8] The basic idea also included a model of temporal input discrimination in biological neuronal networks.[9] An early clear formulation of the reservoir computing idea is due to K. Kirby, who disclosed this concept in a largely forgotten conference contribution.[10] The first formulation of the reservoir computing idea known today stems from L. Schomaker,[11] who described how a desired target output could be obtained from an RNN by learning to combine signals from a randomly configured ensemble of spiking neural oscillators.

Variants

Echo state networks can be built in different ways. They can be set up with or without directly trainable input-to-output connections, with or without output reservation feedback, with different neurotypes, different reservoir internal connectivity patterns etc. The output weight can be calculated for linear regression with all algorithms whether they are online or offline. In addition to the solutions for errors with smallest squares, margin maximization criteria, so-called training support vector machines, are used to determine the output values.[12] Other variants of echo state networks seek to change the formulation to better match common models of physical systems, such as those typically those defined by differential equations. Work in this direction includes echo state networks which partially include physical models,[13] hybrid echo state networks,[14] and continuous-time echo state networks.[15]

The fixed RNN acts as a random, nonlinear medium whose dynamic response, the "echo", is used as a signal base. The linear combination of this base can be trained to reconstruct the desired output by minimizing some error criteria.

Significance

RNNs were rarely used in practice before the introduction of the ESN, because of the complexity involved in adjusting their connections (e.g., lack of autodifferentiation, susceptibility to vanishing/exploding gradients, etc.). RNN training algorithms were slow and often vulnerable to issues, such as branching errors.[16]  Convergence could therefore not be guaranteed. On the other hand, ESN training does not have a problem with branching and is easy to implement. In early studies, ESNs were shown to perform well on time series prediction tasks from synthetic datasets.[17]

Today, many of the problems that made RNNs slow and error-prone have been addressed with the advent of autodifferentiation (deep learning) libraries, as well as more stable architectures such as long short-term memory and Gated recurrent unit; thus, the unique selling point of ESNs has been lost. RNNs have also proven themselves in several practical areas, such as language processing. To cope with tasks of similar complexity using reservoir calculation methods requires memory of excessive size.

ESNs are used in some areas, such as signal processing applications. In particular, they have been widely used as a computing principle that mixes well with non-digital computer substrates. Since ESNs do not need to modify the parameters of the RNN, they make it possible to use many different objects as their nonlinear "reservoir″. For example, optical microchips, mechanical nanooscillators, polymer mixtures, or even artificial soft limbs.

References

  1. 10.1126/science.1091277 . Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication. 2004. Jaeger. H.. Haas. H.. Science. 304. 5667. 78–80. 15064413. 2004Sci...304...78J. 2184251.
  2. 10.4249/scholarpedia.2330. Echo state network. 2007. Jaeger. Herbert. Scholarpedia. 2. 9. 2330. 2007SchpJ...2.2330J. free.
  3. 10.1109/TNN.2011.2162109. Echo State Gaussian Process. 2011. Chatzis. S. P.. Demiris. Y.. IEEE Transactions on Neural Networks. 22. 9. 1435–1445. 21803684. 8553623.
  4. Book: Jaeger, Herbert. A tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the "echo state network" approach. German National Research Center for Information Technology. 2002. Germany. 1–45.
  5. Antonik. Piotr. Marvyn. Gulina. Jaël. Pauwels. Serge. Massar. 2018. Using a reservoir computer to learn chaotic attractors, with applications to chaos synchronization and cryptography. Phys. Rev. E. 98. 1 . 012215. 10.1103/PhysRevE.98.012215. 30110744 . 1802.02844. 2018PhRvE..98a2215A . 3616565 .
  6. Maass W., Natschlaeger T., and Markram H.. 2002. Real-time computing without stable states: A new framework for neural computation based on perturbations. Neural Computation. 14. 11. 2531–2560. 10.1162/089976602760407955. 12433288. 1045112.
  7. Schiller U.D. and Steil J. J.. 2005. Analyzing the weight dynamics of recurrent learning algorithms. Neurocomputing. 63. 5–23. 10.1016/j.neucom.2004.04.006.
  8. Dominey P.F.. 1995. Complex sensory-motor sequence learning based on recurrent state representation and reinforcement learning. Biol. Cybernetics. 73. 3. 265–274. 10.1007/BF00201428. 7548314. 1603500.
  9. Buonomano, D.V. and Merzenich, M.M.. 12880807. 1995. Temporal Information Transformed into a Spatial Code by a Neural Network with Realistic Properties. Science. 267. 5200. 1028–1030. 10.1126/science.7863330. 7863330. 1995Sci...267.1028B.
  10. Kirby, K.. 1991. Context dynamics in neural sequential learning. Proc. Florida AI Research Symposium. 66–70.
  11. Schomaker, L.. 1992. A neural oscillator-network model of temporal pattern generation. Human Movement Science. 11. 1–2. 181–192. 10.1016/0167-9457(92)90059-K.
  12. Schmidhuber J., Gomez F., Wierstra D., and Gagliolo M.. 2007. Training recurrent networks by evolino. Neural Computation. 19. 3. 757–779. 10.1162/neco.2007.19.3.757. 17298232. 11745761. 10.1.1.218.3086.
  13. Doan N, Polifke W, Magri L. Physics-Informed Echo State Networks. Journal of Computational Science. 2020. 47. 101237. 10.1016/j.jocs.2020.101237. 2011.02280. 226246385.
  14. Pathak J, Wikner A, Russel R, Chandra S, Hunt B, Girvan M, Ott E. Chaos. 10.1063/1.5028373. Hybrid Forecasting of Chaotic Processes: Using Machine Learning in Conjunction with a Knowledge-Based Model. 2018. 28. 4. 041101. 31906641. 1803.04779. 2018Chaos..28d1101P. 3883587.
  15. 2010.04004. Anantharaman. Ranjan. Ma. Yingbo. Gowda. Shashi. Laughman. Chris. Shah. Viral. Edelman. Alan. Rackauckas. Chris. Accelerating Simulation of Stiff Nonlinear Systems using Continuous-Time Echo State Networks. 2020. cs.LG.
  16. Book: Doya K.. [Proceedings] 1992 IEEE International Symposium on Circuits and Systems . Bifurcations in the learning of recurrent neural networks . 1992. 6. 2777–2780. 10.1109/ISCAS.1992.230622. 0-7803-0593-0. 15069221.
  17. Jaeger H.. 2007. Discovering multiscale dynamical features with hierarchical echo state networks. Technical Report 10, School of Engineering and Science, Jacobs University.