Research reports

UnICORNN: A recurrent model for learning very long time dependencies

by T. K. Rusch and S. Mishra

(Report number 2021-10)

Abstract
The design of recurrent neural networks (RNNs) to accurately process sequential inputs with long-time dependencies is very challenging on account of the exploding and vanishing gradient problem. To overcome this, we propose a novel RNN architecture which is based on a structure preserving discretization of a Hamiltonian system of second-order ordinary differential equations that models networks of oscillators. The resulting RNN is fast, invertible (in time), memory efficient and we derive rigorous bounds on the hidden state gradients to prove the mitigation of the exploding and vanishing gradient problem. A suite of experiments are presented to demonstrate that the proposed RNN provides state of the art performance on a variety of learning tasks with (very) long-time dependencies.

Keywords: Recurrent neural network, Hamiltonian system, Gradient stability, Long-​term dependencies

BibTeX
@Techreport{RM21_952,
  author = {T. K. Rusch and S. Mishra},
  title = {UnICORNN: A recurrent model for learning very long time dependencies},
  institution = {Seminar for Applied Mathematics, ETH Z{\"u}rich},
  number = {2021-10},
  address = {Switzerland},
  url = {https://www.sam.math.ethz.ch/sam_reports/reports_final/reports2021/2021-10.pdf },
  year = {2021}
}

Disclaimer
© Copyright for documents on this server remains with the authors. Copies of these documents made by electronic or mechanical means including information storage and retrieval systems, may only be employed for personal use. The administrators respectfully request that authors inform them when any paper is published to avoid copyright infringement. Note that unauthorised copying of copyright material is illegal and may lead to prosecution. Neither the administrators nor the Seminar for Applied Mathematics (SAM) accept any liability in this respect. The most recent version of a SAM report may differ in formatting and style from published journal version. Do reference the published version if possible (see SAM Publications).

JavaScript has been disabled in your browser