Research reports

Long Expressive Memory for Sequence Modeling

by T. K. Rusch and S. Mishra and N. B. Erichson and M. W. Mahoney

(Report number 2021-33)

We propose a novel method called Long Expressive Memory (LEM) for learning long-term sequential dependencies. LEM is gradient-based, it can efficiently process sequential tasks with very long-term dependencies, and it is sufficiently expressive to be able to learn complicated input-output maps. To derive LEM, we consider a system of multiscale ordinary differential equations, as well as a suitable time-discretization of this system. For LEM, we derive rigorous bounds to show the mitigation of the exploding and vanishing gradients problem, a well-known challenge for gradient-based recurrent sequential learning methods. We also prove that LEM can approximate a large class of dynamical systems to high accuracy. Our empirical results, ranging from image and time-series classification through dynamical systems prediction to speech recognition and language modeling, demonstrate that LEM outperforms state-of-the-art recurrent neural networks, gated recurrent units, and long short-term memory models.

Keywords: sequence modeling, long-term dependencies, multiscale ordinary differential equations, dynamical systems

  author = {T. K. Rusch and S. Mishra and N. B. Erichson and M. W. Mahoney},
  title = {Long Expressive Memory for Sequence Modeling},
  institution = {Seminar for Applied Mathematics, ETH Z{\"u}rich},
  number = {2021-33},
  address = {Switzerland},
  url = {https://www.sam.math.ethz.ch/sam_reports/reports_final/reports2021/2021-33.pdf },
  year = {2021}

