Research reports
Years: 2024 2023 2022 2021 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 1999 1998 1997 1996 1995 1994 1993 1992 1991
Long Expressive Memory for Sequence Modeling
by T. K. Rusch and S. Mishra and N. B. Erichson and M. W. Mahoney
(Report number 2021-33)
Abstract
We propose a novel method called Long Expressive Memory (LEM) for learning long-term sequential dependencies. LEM is gradient-based, it can efficiently process sequential tasks with very long-term dependencies, and it is sufficiently expressive to be able to learn complicated input-output maps. To derive LEM, we consider a system of multiscale ordinary differential equations, as well as a suitable time-discretization of this system. For LEM, we derive rigorous bounds to show the mitigation of the exploding and vanishing gradients problem, a well-known challenge for gradient-based recurrent sequential learning methods. We also prove that LEM can approximate a large class of dynamical systems to high accuracy. Our empirical results, ranging from image and time-series classification through dynamical systems prediction to speech recognition and language modeling, demonstrate that LEM outperforms state-of-the-art recurrent neural networks, gated recurrent units, and long short-term memory models.
Keywords: sequence modeling, long-term dependencies, multiscale ordinary differential equations, dynamical systems
BibTeX@Techreport{RMEM21_975, author = {T. K. Rusch and S. Mishra and N. B. Erichson and M. W. Mahoney}, title = {Long Expressive Memory for Sequence Modeling}, institution = {Seminar for Applied Mathematics, ETH Z{\"u}rich}, number = {2021-33}, address = {Switzerland}, url = {https://www.sam.math.ethz.ch/sam_reports/reports_final/reports2021/2021-33.pdf }, year = {2021} }
Disclaimer
© Copyright for documents on this server remains with the authors.
Copies of these documents made by electronic or mechanical means including
information storage and retrieval systems, may only be employed for
personal use. The administrators respectfully request that authors
inform them when any paper is published to avoid copyright infringement.
Note that unauthorised copying of copyright material is illegal and may
lead to prosecution. Neither the administrators nor the Seminar for
Applied Mathematics (SAM) accept any liability in this respect.
The most recent version of a SAM report may differ in formatting and style
from published journal version. Do reference the published version if
possible (see SAM
Publications).