回帰型ニューラルネットワーク

回帰型ニューラルネットワーク（かいきがたニューラルネットワーク、英: Recurrent neural network; RNN）は内部に循環をもつニューラルネットワークの総称・クラスである^[1]。

概要[編集]

ニューラルネットワークは入力を線形変換する処理単位からなるネットワークである。このネットワーク内に循環が存在する、すなわちユニットの出力が何らかの経路で自身へ再び入力する場合、これを回帰型ニューラルネットワークという^[1]。回帰のないネットワーク（順伝播型ニューラルネットワーク; Feed-Forward Network; FFN）と対比される。

RNNは任意のひと続きの入力を処理するために内部状態（記憶）を使うことができる。これによって、時系列のための時間的な動的振る舞いを示すことが可能となる^[2]。これによって、分割化されていない、つながりのある手書き文字認識^[3]や音声認識^[4]^[5]といった課題に応用が可能になっている。

「回帰型ニューラルネットワーク」という用語は、類似した一般構造を持つ2つの広いネットワークのクラスを指し示すために見境なく使われる。1つは有限インパルス、もう1つは無限インパルスである。どちらのネットワークのクラスも時間的な動的振る舞いを示す^[6]。有限インパルス回帰型ネットワークは厳密な順伝播型ニューラルネットワークに展開でき、置き換えることができる有向非巡回グラフであるのに対して、無限インパルス回帰型ネットワークは展開できない有向巡回グラフである。

有限インパルスと無限インパルス回帰型ネットワークはどちらも追加の保管状態を持つことができ、この保管場所はニューラルネットワークによる直接的な制御下とすることができる。保管場所は他のネットワークやグラフが時間遅延を取り込むか、フィードバックループを持つのであれば、それらで置き換えることもできる。こういった制御された状態はゲート状態またはゲート記憶と呼ばれ、長・短期記憶ネットワーク（LSTMs）およびゲート付き回帰型ユニット（GRUs）の一部である。

和訳[編集]

再帰型ニューラルネットまたは循環ニューラルネットと訳されこともある^[7]。本項では「Recurrent」ニューラルネットワークの訳語として「回帰型」、「Recursive」ニューラルネットワークの訳語として「再帰型」を用いる^[8]。

歴史[編集]

回帰型ニューラルネットワークは1986年のデビッド・ラメルハートの研究に基づく^[9]。ホップフィールド・ネットワークは1982年にジョン・ホップフィールドによって見出された。1993年、ニューラルヒストリー圧縮システムが、時間に展開されたRNN中で1000以上の層を必要とする「非常に深い学習」問題を解決した^[10]。

長・短期記憶（LSTM）は2007年頃から音声認識に革命をもたらし始め、特定の音声認識への応用において伝統的なモデルをしのいだ^[11]。2009年、コネクショニスト時系列分類（英語版）（CTC）で訓練されたLSTMネットワークは、パターン認識大会で優勝した初のRNNとなった。このネットワークはつながった手書き文字認識の複数の大会で優勝した^[12]^[13]。2014年、中国の大手検索サイト百度は、伝統的な音声処理法を用いることなくSwitchboard Hub5'00音声認識ベンチマークを破るためにCTCで訓練されたRNNを用いた^[14]。

LSTMはまた、大規模語彙音声認識^[4]^[5]およびテキスト音声合成^[15]を改良し、Google Androidにおいて使われた.^[12]^[16]。2015年、GoogleはCTCで訓練されたLSTMによって音声認識の劇的な性能向上が達成された^[17]と報告され、この技術はGoogle Voice Search（英語版）で使用された。

LSTMは機械翻訳^[18]、言語モデリング^[19]、多言語処理^[20]の記録を破った。畳み込みニューラルネットワーク（CNN）と組み合わされたLSTMは自動画像キャプション（短い説明文）付けを向上させた^[21]。

構造[編集]

RNNには多くの派生形式がある。

完全回帰型[編集]

基本的なRNNは連続する「層」へと編成されたニューロン的ノードのネットワークであり、所定の層中の個々のノードは次の層中の全てのノードと有向（一方向）結合により結合されている^[要出典]。個々のノード（ニューロン）は時間変動する実数値の活性化を有する。個々の結合（シナプス）は変更可能な実数値の重み（英語版）を有する。ノードは（ネットワーク外からデータを受け取る）入力ノード、（結果を得る）出力ノード、（入力から出力への途上でデータを修正する）隠れノードのいずれかである。

離散時間設定における教師あり学習のため、実数値入力ベクトルの配列は入力ノードに到着する（一度に1つのベクトル）。任意の時間ステップにおいて、個々の非入力ユニットはそれに結合した全てのユニットの活性化の加重和の非線形関数としてその現在の活性化（結果）を計算する。ある時間ステップにおける一部の出力ユニットのために教師が与えられた目標活性化を提供することができる。例えば、入力配列が数字音声に対応した音声シグナルであるならば、配列の最後における最終目標出力は数字を分類するラベルとなるだろう。

強化学習のセッティングでは、教師は目標シグナルを与えない。代わりに、適合度関数（英語版）または報酬関数がRNNの性能を評価するために使われることがある。これは環境に影響を与えるアクチュエータに結合された出力ユニットを通してその入力ストリームに影響する。これは、進行が勝ち取った点数によって測定されるゲームをプレーするために使うことができるかもしれない。

個々の配列は、全ての目標シグナルのネットワークによって計算された対応する活性化からのずれの和として誤差を生じる。膨大な配列のセットを訓練では、全誤差は全ての個別の配列の誤差の和である。

エルマンネットワークとジョーダンネットワーク[編集]

エルマン（英語版）ネットワークは、一連の「文脈ユニット」（右図中のu）を追加した3層ネットワーク（右図中でx、y、zとして垂直に配置されている）である。中央（隠れ）層は1の重みに固定されたこれらの文脈ユニットに結合されている^[22]。個々の時間ステップにおいて、入力は順伝播され、学習規則が適用される。固定された逆結合は文脈ユニット中の隠れユニットの以前の値のコピーを保存する（これは、それらが学習規則が適用する前に結合を通じて伝播されるためである）。したがって、ネットワークは一種の状態を維持することができ、これによって標準的な多層パーセプトロンの能力を超える時系列予測といった課題を実行することが可能となる。

ジョーダン（英語版）ネットワークはエルマンネットワークと似ている。文脈ユニットは隠れ層の代わりに出力層から入力を得る。ジョーダンネットワーク中の文脈ユニットは状態層とも呼ばれる。それらはそれら自身への回帰的結合を持つ^[22]。

エルマンネットワークとジョーダンネットワークは「単純回帰型ネットワーク（SRN）」としても知られている。

エルマンネットワーク^[23]: ${\begin{aligned}h_{t}&=\sigma _{h}(W_{h}x_{t}+U_{h}h_{t-1}+b_{h})\\y_{t}&=\sigma _{y}(W_{y}h_{t}+b_{y})\end{aligned}}$
ジョーダンネットワーク^[24]: ${\begin{aligned}h_{t}&=\sigma _{h}(W_{h}x_{t}+U_{h}y_{t-1}+b_{h})\\y_{t}&=\sigma _{y}(W_{y}h_{t}+b_{y})\end{aligned}}$

変数および関数

$x_{t}$ : 入力ベクトル
$h_{t}$ :隠れ層ベクトル
$y_{t}$ : 出力ベクトル
$W$ 、 $U$ 、および $b$ : パラメータ行列およびベクトル
$\sigma _{h}$ および $\sigma _{y}$ : 活性化関数

ホップフィールド[編集]

詳細は「ホップフィールドネットワーク」を参照

ホップフィールドネットワークは全ての結合が対称的なRNNである。定常入力を必要とし、複数パターンの配列を処理しないため、汎用RNNではない。ホップフィールドネットワークは収束することを保証している。もし結合がヘッブの学習を用いて訓練されるならば、ホップフィールドネットワークは結合変化に抵抗性のある頑強な連想メモリとして機能することができる。

双方向連想メモリ[編集]

詳細は「双方向連想メモリ」を参照

Bart Koskoによって発表された^[25]双方向連想メモリ（bidirectional associative memory、BAM）ネットワークは、ベクトルとして連想データを貯蔵するホップフィールドネットワークの一変型である。双方向性は行列とその転置行列を通って情報が流れることから来ている。典型的には、双極符号化（英語版）が連想対の二値符号化よりも選好される。最近、マルコフ飛び（ステッピング）を用いた確率的BAMモデルが増強したネットワーク安定化ために最適化され、現実世界の応用と関わりを持った^[26]。

BAMネットワークは2つの層を持ち、そのうちのどちらかを、連想を思い出し、もう一方の層上へ出力を生成するための入力として動作させることができる^[27]。

エコー状態[編集]

詳細は「エコー状態ネットワーク」を参照

エコー状態ネットワーク（英語版）（Echo state network、ESN）は、疎らに結合されたランダム隠れ層を持つ。出力ニューロンの重みは変更可能な（訓練可能な）ネットワークの一部でしかない。ESNは特定の時系列の再現に秀でている^[28]。スパイキングニューロンのための派生形式は液体状態マシン（英語版）として知られる^[29]。

独立RNN (IndRNN)[編集]

独立回帰型ニューラルネットワーク（Independently recurrent neural network、IndRNN^[30]）は、従来の完全結合型RNNにおける勾配消失および爆発問題に対処する。1つの層中の個々のニューロンは（この層中の他の全てのニューロンへの完全な結合の代わりに）文脈情報としてそれ自身の過去状態のみを受け取り、ゆえにニューロンは互いの履歴に独立である。勾配バックプロパゲーションは、長期または短期記憶を保持するため、勾配消失および爆発を避けるために制御することができる。ニューロン間情報は次の層において探索される。IndRNNはReLUといった非飽和非線形関数を使って確実に訓練することができる。スキップコネクションを使うことで、深いネットワークを訓練することができる。

再帰型[編集]

詳細は「再帰型ニューラルネットワーク」を参照

再帰型ニューラルネットワーク（英語版）（recursive neural network）^[31]は、トポロジカル順序で可微分なグラフ様構造を横断することによって、同じ一連の重みを構造に再帰的に適用することによって作られる。このようなネットワークは典型的に自動微分の反転モードによって訓練することもできる^[32]^[33]。再帰型ニューラルネットワークは、論理項といった構造の分散表現を処理することできる。再帰型ニューラルネットワークの特殊な場合が、構造が直鎖に対応するRecurrent（回帰型）NNである。再帰型ニューラルネットワークは自然言語処理に応用されてきた^[34]。再帰型ニューラルテンソルネットワークは、木中の全てのノードに対してテンソルベースの合成関数を使用する^[35]。

ニューラルヒストリーコンプレッサ[編集]

ニューラルヒストリーコンプレッサ（neural history compressor）はRNNの教師なしスタックである^[36]。入力レベルにおいて、前の入力から次の入力を予測することを学習する。この階層型構造において一部のRNNの予測不可能な入力のみが次のより高いレベルのRNNへの入力となる。したがって、極めてまれにしかその内部状態は再計算されない。ゆえに、個々のより高位のRNNは下位RNN中の情報の圧縮表現を学ぶ。これは、入力配列がより高レベルにおける表現から正確に再構成できるような方法で行われる。

このシステムは、記述長またはデータの確率の負の対数を効果的に最小化する^[37]。入ってくるデータ配列中の多量の学習可能な予測可能性を考えると、最高レベルのRNNは、重要な事象間に長い間隔がある深い配列でさえも容易に分類するために教師あり学習を用いることができる。

このRNN階層を2つのRNN、「意識的」チャンカー（高位）と「無意識的」オートマタイザー（下位）に抜き出すことが可能である^[36]。チャンカーがオートマタイザーによって予測不可能な入力の予測と圧縮を学習すると、次にオートマタイザーは次の学習フェーズにおいて追加ユニットを通して、よりゆっくりと変化するチャンカーの隠れ層を予測または模倣することになる。これによってオートマタイザーが、長い間隔を超えて適切な、めったに変化しない記憶を学習することが容易になる。次に、チャンカーが残った予測不可能な事象に注視できるように、これはオートマタイザーが以前は予測不可能だった入力の多くを予測できるものとするのを助ける^[36]。

生成モデルは、1992年に自動微分またはバックプロパゲーションの勾配消失問題を部分的に克服した^[38]。1993年、こういったシステムは時間方向に展開されたRNN中に1000を超える後続層を必要とする「非常に深い学習」課題を解決した^[10]。

二次RNN[編集]

二次（second order）RNNは、標準的な重み $w{}_{ij}$ の代わりにより高次の重み $w{}_{ijk}$ を用い、状態は積となる。これによって、訓練、安定性、表現において有限状態機械への直接的マッピングが可能となる^[39]^[40]。長・短期記憶（LSTM）はこの一例であるが、こういった形式的マッピングまたは安定性の証明は持たない。

長・短期記憶[編集]

詳細は「長・短期記憶」を参照

長・短期記憶（LSTM）は勾配消失問題を回避するディープラーニング（深層学習）システムである。LSTMは通常、「忘却」ゲートと呼ばれる回帰型ゲートによって拡張されている^[41]。LSTMは勾配の消失または爆発からの逆伝播誤差を防ぐ^[38]。代わりに、誤差は空間方向に展開された無制限の数のバーチャル層を通して逆向きに流れる。すなわち、LSTMは、数千または数百万離れた時間段階前に起こった事象の記憶を必要とする課題を学習できる^[12]。問題特化型のLSTM的トポロジーを発展させることができる。^[42]。LSTMは重要な事象間に長い遅延が与えられても機能し、低周波数と高周波数成分を混合した信号を扱うことができる。

多くの応用がLSTM RNNのスタックを用いており^[43]、訓練セット中のラベル配列の確率を最大化するRNN重み行列を見付けるためにそれらをコネクショニスト時系列分類（CTC）^[44]によって訓練している。CTCはアラインメントと認識の両方を達成する。

LSTMは隠れマルコフモデル（HMM）や類似の概念に基づく以前のモデルとは異なり、文脈依存言語を認識することを学習することができる^[45]。

ゲート付き回帰型ユニット[編集]

詳細は「ゲート付き回帰型ユニット」を参照

ゲート付き回帰型ユニット（GRUs）は2014年に発表された回帰型ニューラルネットワークにおけるゲート機構である。完全な形式やいくつかの単純化された方式で使われている^[46]^[47]。多声音楽モデリングおよび音声信号モデリングにおけるそれらの性能は長・短期記憶の性能と似ていることが明らかにされた^[48]。これらは出力ゲートを持っていないため、LSTMよりもパラメータが少ない^[49]。

双方向性[編集]

双方向性（bi-directional）RNNsは要素の過去および未来の文脈に基づいて配列の個々の要素を予測あるいはラベル付けするために有限配列を用いる。これは、2つのRNNの出力を統合することによってなされる。一方のRNNは配列を左から右へ、もう一方は右から左へと処理する。統合された出力は教師が与えられた対象シグナルの予測である。この技法はLSTM RNNsを組み合わせた時に特に有用であることが証明されている^[50]^[51]。

連続時間[編集]

連続時間（continuous time）回帰型ニューラルネットワーク（CTRNN）は、入ってくるスパイクの一連の流れのニューロンへの影響をモデル化するために常微分方程式の系を用いる。

活動電位 $y_{i}$ を持つネットワーク中のニューロン $i$ に対して、活性化の変化率は以下の式で与えられる。

\tau _{i}{\dot {y}}_{i}=-y_{i}+\sum _{j=1}^{n}w_{ji}\sigma (y_{j}-\Theta _{j})+I_{i}(t)

上式において、

$\tau _{i}$ : シナプス後ノードの時定数
$y_{i}$ : シナプス後ノードの活性化
${\dot {y}}_{i}$ : シナプス後ノードの活性化の変化率
$w{}_{ji}$ : シナプス前ノードからシナプス後ノードへの結合の重み
$\sigma (x)$ : xのシグモイド。例: $\sigma (x)=1/(1+e^{-x})$
$y_{j}$ : シナプス前ノードの活性化
$\Theta _{j}$ : シナプス前ノードのバイアス
$I_{i}(t)$ : （もしあれば）ノードへの入力

CTRNNsは進化ロボティクスに適用された。進化ロボティクスでは、CTRNNsはビジョン^[52]、連携^[53]、および軽度認知行動^[54]に取り組むために使われている。

ここで留意すべきは、シャノン標本化定理により、離散時間回帰型ニューラルネットワークは、微分方程式が等価な差分方程式へと変形された連続時間回帰型ニューラルネットワークを見ることができる、という点である。この変形は、シナプル後ノード活性化関数 $y_{i}(t)$ がローパスフィルターを通された後に（しかしサンプリングより前に）起こると考えることができる。

階層的[編集]

階層的（hierarchical）RNNsは、階層的振る舞いを有用なサブプログラムへと分解するために様々なやり方でそれらのニューロンを結合する^[36]^[55]。

回帰型多層パーセプトロンネットワーク[編集]

一般に、回帰型多層パーセプトロン（Recurrent Multi-Layer Perceptron、RMLP）ネットワークは直列のサブネットワークから構成され、それぞれのサブネットワークは多層のノードを含む。これらのサブネットワークのそれぞれは、フィードバック結合を持ちうる最終層を除いて順伝播型である。これらのサブネットワークのそれぞれは、順伝播型結合によってのみ結合されている^[56]。

多重時間スケールモデル[編集]

多重時間スケール（multiple timescales）回帰型ニューラルネットワーク（MTRNN）は、ニューロン間の空間的結合および異なる種類のニューロン活動（個々は異なる時間特性を持つ）に依存した自己組織化を通して脳の機能的階層をシミュレートできるニューラルネットワークに基づいた計算モデルである^[57]^[58]。こういった変化に富んだ神経活動により、一連の挙動の連続的変化が再使用可能なプリミティブへと分割され、それらは次に多様な逐次的挙動へと柔軟に統合される。こういった種類の階層の生物学的同意は、ジェフ・ホーキンスによる著書『考える脳考えるコンピューター（英語版）』（2005年）中の脳機能の自己連想記憶理論（英語版）において議論された^[要出典]。

ニューラルチューリングマシン[編集]

詳細は「ニューラルチューリングマシン」を参照

ニューラルチューリングマシン（英語版）（Neural Turing machine、NTM）は、回帰型ニューラルネットワークを外部記憶装置を連結することによってそれらを拡張する手法である。RNNは注意（attention）過程によって外部記憶装置と相互作用できる。組み合わされた系はチューリングマシンまたはフォン・ノイマン構造と類似しているが、端から端まで微分可能（英語版）であり、これによって最急降下法を用いて効率的に学習することが可能となる^[59]。

微分可能ニューラルコンピュータ[編集]

詳細は「微分可能ニューラルコンピュータ」を参照

微分可能ニューラルコンピュータ（differentiable neural computer、DNC）はニューラルチューリングマシンの拡張であり、曖昧な量の個々のメモリアドレスと出来事の配列の記憶を使うことができる。

ニューラルネットワーク・プッシュダウン・オートマトン[編集]

ニューラルネットワーク・プッシュダウン・オートマトン（NNPDA）はNTMと似ているが、（入力）テープは微分可能で、訓練される類似スタックによって置き換えられる。このようにして、NNPDAは文脈自由文法（CFG）の認識器と複雑さが似ている^[60]。

線形回帰[編集]

線形回帰（英: linear recurrence）は非線形活性化関数を持たない回帰モジュール・レイヤーである。

RNNを含むニューラルネットワークは定義としては非線形活性化関数を必要としない^[61]。しかし実践的にはほぼ必ずシグモイド関数などの非線形変換を導入している。ゆえに状態 $h_{t-1}$ が回帰する際、 $h_{t-1}$ は非線形変換されたうえで $f(x_{t},h_{t-1})$ へ回帰していることになる^[62]。この系列・時間方向への非線形変換を無くし線形回帰とするモジュール・レイヤーが提案されている^[63]^[64]^[65]。

訓練[編集]

最急降下法[編集]

詳細は「最急降下法」を参照

最急降下法は、関数の極小値を探し出すための一次の反復的最適化アルゴリズムである。ニューラルネットワークでは、非線形活性化関数が可微分であるという条件で、重みに関する誤差の微分係数に比例して個々の重みを変化させることによって誤差項を最小化するために使うことができる。これを行うための様々な手法はワーボス（英語版）、ウィリアムス（英語版）、ロビンソン（英語版）、シュミットフーバー（英語版）、ホッフライター（英語版）、パールマターらによって1980年代と1990年代初頭に開発された。

標準的手法は「通時的誤差逆伝播法（英語版）（Backpropagation through time、BPTT）」と呼ばれ、順伝播型ネットワークのための誤差逆伝播法の一般化である^[66]^[67]。誤差逆伝播法と同様に、BPTTはポントリャーギンの最小値原理（英語版）の後ろ向き連鎖（reverse accumulation）モードにおける自動微分の実例である。計算コストがより高いオンライン版は「実時間リカレント学習（Real-Time Recurrent Learning、RTRL）」と呼ばれる^[68]^[69]。これは、積み重ねられた接ベクトルを持つ前向き連鎖（forward accumulation）モードにおける自動微分の実例である。BPTTとは異なり、このアルゴリズムは時間について局所的だが、空間については局所的でない。

この文脈において、空間について局所的とは、単一ユニットの更新計算量が重みベクトルの次元において線形であるように、ユニットの重みベクトルが結合されたユニットとユニットそれ自身に蓄えられた情報のみを用いて更新できることを意味する。時間について局所的とは、更新が連続的に（オンラインで）起こり、BPTTのように任意の時間地平線内の複数の時間ステップではなく最も近い時間ステップにのみ依存することを意味する。生物学的ニューラルネットワークは時間と空間の両方に関して局所的であるように見える^[70]^[71]。

偏微分の再帰的計算について、RTRLはヤコビ行列を計算するために時間ステップ毎にO(隠れ層の数 × 重みの数) の時間計算量を持つのに対して、BPTTは任意の時間地平線内の全ての順方向活性化を記憶するという代償を払って、時間ステップ毎にO(重みの数) しか取らない^[72]。BPTTとRTRLの中間の計算量を持つオンラインハイブリッド版^[73]^[74]や、連続時間版^[75]が存在する。

標準的なRNN構造に対する最急降下法の大きな問題は、誤差勾配が重要な事象間の時間差の大きさに伴い指数関数的に急速に消失することである^[38]^[76]。BPTT/RTRL混成学習手法を組み合わされたLSTMはこれらの問題の克服を試みている^[77]。この問題は、ニューロンの文脈をそれ自身の過去状態に減らすことによって独立回帰型ニューラルネットワーク（IndRNN）^[30]でも解決され、次にニューロン横断的情報は続く層において探索できる。長期記憶を含む異なる範囲の記憶は勾配消失および爆発問題を起こさずに学習できる。

因果的再帰誤差逆伝播法（causal recursive backpropagation、CRBP）は、局所的に回帰したネットワークのためにBPTTおよびRTRL枠組みを実装し、組み合わせる^[78]。CRBPは最も一般的な局所回帰型ネットワークと連携する。CRBPアルゴリズムは大域誤差項を最小化できる。この事実はアルゴリズムの安定性を向上し、これは局所フィードバックを持つ回帰型ネットワークのための勾配計算技法に関する統一的な概観をもたらす。

任意の構造を持つRNNにおける勾配情報の計算のためのある手法は、シグナルフローグラフ図式導出（signal-flow graphs diagrammatic derivation）に基づく^[79]。この手法はBPTTバッチアルゴリズムを用い、ネットワーク感度計算に関するLeeの定理に基づく^[80]。これはWanおよびBeaufaysによって提案されたが、その高速なオンライン版はCampolucci、Uncini、およびPiazzaによって提案された^[80]。

大域的最適化手法[編集]

ニューラルネットワークにおける重みの訓練は、非線形大域的最適化（英語版）問題としてモデル化できる。目的関数は、以下のように特定の重みベクトルの適合度または誤差を評価するために作ることができる。第一に、ネットワークの重みは重みベクトルにしたがって設定される。次に、ネットワークは訓練配列に対して評価される。典型的には、予測値と訓練配列中で指定される目標値との間の差分二乗和が現在の重みベクトルの誤差を表わすために使われる。任意の大域的最適化技法を次に目的関数を最小化するために使うことができる。

RNNを訓練するための最も一般的な大域的最適化手法は遺伝的アルゴリズムである（特に非構造化ネットワークにおいて）^[81]^[82]^[83]。

最初に、遺伝的アルゴリズムは染色体中の1つの遺伝子が1つの重み結合を表わす所定のやり方でニューラルネットワーク重みを使ってエンコードされる。全ネットワークは単一の染色体として表わされる。数適応度関数は以下のように評価される。

染色体中にコードされた個々の重みはネットワークの個別の重み結合へと割り当てられる。
訓練セットは入力シグナルを前向きに伝播するネットワークへと提示される。
平均二乗誤差が適応度関数に返される。
この関数が遺伝的選択過程を駆動する。

多くの染色体が集団を作り上げる。しあたがって、多くの異なるニューラルネットワークは停止基準が満されるまで進化する。一般的な停止スキームは、

ニューラルネットワークが訓練データの一定のパーセンテージを学習した時、または
平均二乗誤差の最小値が満された時、または
訓練世代の最大値に逹した時

である。停止基準は、訓練中の個々のネットワークからの平均二乗誤差の逆数を得る適応度関数によって評価される。したがって、遺伝的アルゴリズムの目標は適応度関数を最大化する（これによって平均二乗誤差が減少する）ことである。

焼きなまし法または粒子群最適化といった他の大域的（と進化的の両方またはいずれか一方）最適化技法を良い重みのセットを探すために使うことができる。

評価[編集]

RNNモデルの性能は様々なタスク・指標を用いて評価される。以下はその一例である。

Copyingタスク[編集]

Copyingタスクは系列処理モデルの記憶力を評価するために「最初に提示された数字の並びを最後に思い出す」タスクである^[84]。

モデルにはまず $\{1,\ ...,\ 8\}$ からランダムサンプリングされた10個の入力が連続して渡され（記憶ステップ）、次にL個の $0$ が渡され（保持ステップ）、最後に $9$ が10連続で渡される（想起ステップ）。モデルは最初の10個の数字を覚え、Lステップ続く $0$ の間それを覚えておき、 $9$ に応答して最初の10個の数字を順番通り出力しなければならない^[85]。下の擬似コードが入力と理想的な出力である。

#  |    memorize   |   hold   |   recall  | i = [1,4,2,2,...,3, 0,0,...,0, 9,9,9,...,9] o = [0,0,0,.................0, 1,4,2,...,3]

Copyingタスクは長期のタイムラグを跨いで記憶を保持するタスクであり^[86]、長期記憶を直接評価する標準的なタスクである。シンプルながら難しいことが知られており、エルマンネット等の単純RNNはこのタスクを解けず、LSTMもL=100を部分的にしか学習できないことが知られている^[87]^[88]。

ライブラリ[編集]

主要なディープラーニングライブラリ（例: PyTorch/Caffe/Torch（英語版）, TensorFlow/Keras, Chainer, Deeplearning4j, DyNet, Microsoft Cognitive Toolkit, MXNet（英語版）, Theano（英語版））や機械学習ライブラリ（例: Apache SINGA（英語版））がRNNの学習と推論をサポートしている。

応用[編集]

回帰型ニューラルネットワークの応用:

機械翻訳
ロボット制御（英語版）^[90]
時系列予想^[91]
音声認識^[92]^[93]^[94]
時系列異常検出^[95]
リズム学習^[96]
作曲^[97]
文法学習^[98]^[99]^[100]
手書き文字認識^[101]^[102]
人物行動認識^[103]
タンパク質相同性検出^[104]
タンパク質の細胞内局在の予測^[105]
ビジネスプロセス管理の分野におけるいくつかの予測課題^[106]
医療パスにおける予測^[107]

出典[編集]

^ ^a ^b "If a network has one or more cycles, that is, if it is possible to follow a path from a unit back to itself, then the network is referred to as recurrent." Jordan, M.I. (1986). Serial order: A parallel distributed processing approach. (Tech. Rep. No. 8604). San Diego: University of California, Institute for Cognitive Science.
^ Jinyu Li Li Deng Reinhold Haeb-Umbach Yifan Gong (2015). Robust Automatic Speech Recognition. Academic Press. ISBN 978-0128023983
^ Graves, A.; Liwicki, M.; Fernandez, S.; Bertolami, R.; Bunke, H.; Schmidhuber, J. (2009). “A Novel Connectionist System for Improved Unconstrained Handwriting Recognition” (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (5).
^ ^a ^b Sak, Hasim (2014年). “Long Short-Term Memory recurrent neural network architectures for large scale acoustic modeling”. 2019年4月5日閲覧。
^ ^a ^b Li, Xiangang; Wu, Xihong (15 October 2014). "Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition". arXiv:1410.4281 [cs.CL]。
^ Miljanovic, Milos (Feb-Mar 2012). “Comparative analysis of Recurrent and Finite Impulse Response Neural Networks in Time Series Prediction”. Indian Journal of Computer and Engineering 3 (1).
^ 岡谷 2015, pp. 112
^ 渡辺太郎「ニューラルネットワークによる構造学習の発展」『人工知能』第31巻第2号、202--209頁、NAID 110010039602。
^ Williams, Ronald J.; Hinton, Geoffrey E.; Rumelhart, David E. (October 1986). “Learning representations by back-propagating errors”. Nature 323 (6088): 533–536. doi:10.1038/323533a0. ISSN 1476-4687.
^ ^a ^b Schmidhuber, Jürgen (1993). Habilitation thesis: System modeling and optimization Page 150 ff demonstrates credit assignment across the equivalent of 1,200 layers in an unfolded RNN.
^ Fernández, Santiago; Graves, Alex; Schmidhuber, Jürgen (2007). “An Application of Recurrent Neural Networks to Discriminative Keyword Spotting”. Proceedings of the 17th International Conference on Artificial Neural Networks. ICANN'07 (Berlin, Heidelberg: Springer-Verlag): 220–229. ISBN 978-3-540-74693-5.
^ ^a ^b ^c Schmidhuber, Jürgen (January 2015). “Deep Learning in Neural Networks: An Overview”. Neural Networks 61: 85–117. arXiv:1404.7828. doi:10.1016/j.neunet.2014.09.003. PMID 25462637.
^ Graves, Alex; Schmidhuber, Jürgen (2009). Bengio, Yoshua. ed. “Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks”. Neural Information Processing Systems (NIPS) Foundation: 545–552.
^ Hannun, Awni; Case, Carl; Casper, Jared; Catanzaro, Bryan; Diamos, Greg; Elsen, Erich; Prenger, Ryan; Satheesh, Sanjeev; Sengupta, Shubho (17 December 2014). "Deep Speech: Scaling up end-to-end speech recognition". arXiv:1412.5567 [cs.CL]。
^ Bo Fan, Lijuan Wang, Frank K. Soong, and Lei Xie (2015). Photo-Real Talking Head with Deep Bidirectional LSTM. In Proceedings of ICASSP 2015.
^ Zen, Heiga (2015年). “Unidirectional Long Short-Term Memory Recurrent Neural Network with Recurrent Output Layer for Low-Latency Speech Synthesis”. Google.com. ICASSP. pp. 4470–4474. 2019年4月5日閲覧。
^ Sak, Haşim (2015年9月). “Google voice search: faster and more accurate”. 2019年4月5日閲覧。
^ Sutskever, L.; Vinyals, O.; Le, Q. (2014). “Sequence to Sequence Learning with Neural Networks”. Electronic Proceedings of the Neural Information Processing Systems Conference 27: 5346. arXiv:1409.3215. Bibcode: 2014arXiv1409.3215S.
^ Jozefowicz, Rafal; Vinyals, Oriol; Schuster, Mike; Shazeer, Noam; Wu, Yonghui (7 February 2016). "Exploring the Limits of Language Modeling". arXiv:1602.02410 [cs.CL]。
^ Gillick, Dan; Brunk, Cliff; Vinyals, Oriol; Subramanya, Amarnag (30 November 2015). "Multilingual Language Processing From Bytes". arXiv:1512.00103 [cs.CL]。
^ Vinyals, Oriol; Toshev, Alexander; Bengio, Samy; Erhan, Dumitru (17 November 2014). "Show and Tell: A Neural Image Caption Generator". arXiv:1411.4555 [cs.CV]。
^ ^a ^b Cruse, Holk; Neural Networks as Cybernetic Systems, 2nd and revised edition
^ Elman, Jeffrey L. (1990). “Finding Structure in Time”. Cognitive Science 14 (2): 179–211. doi:10.1016/0364-0213(90)90002-E.
^ Jordan, Michael I. (1997-01-01). Serial Order: A Parallel Distributed Processing Approach. Neural-Network Models of Cognition. 121. 471–495. doi:10.1016/s0166-4115(97)80111-2. ISBN 9780444819314
^ Kosko, B. (1988). “Bidirectional associative memories”. IEEE Transactions on Systems, Man, and Cybernetics 18 (1): 49–60. doi:10.1109/21.87054.
^ Rakkiyappan, R.; Chandrasekar, A.; Lakshmanan, S.; Park, Ju H. (2 January 2015). “Exponential stability for markovian jumping stochastic BAM neural networks with mode-dependent probabilistic time-varying delays and impulse control”. Complexity 20 (3): 39–65. Bibcode: 2015Cmplx..20c..39R. doi:10.1002/cplx.21503.
^ Rául Rojas (1996). Neural networks: a systematic introduction. Springer. p. 336. ISBN 978-3-540-60505-8
^ Jaeger, Herbert; Haas, Harald (2004-04-02). “Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication”. Science 304 (5667): 78–80. Bibcode: 2004Sci...304...78J. doi:10.1126/science.1091277. PMID 15064413.
^ W. Maass, T. Natschläger, and H. Markram (2002). “A fresh look at real-time computation in generic recurrent neural circuits”. Technical report, Institute for Theoretical Computer Science (TU Graz).
^ ^a ^b Li, Shuai; Li, Wanqing; Cook, Chris; Zhu, Ce; Yanbo, Gao (2018). “Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN”. IEEE Conference on Computer Vision and Pattern Recognition. arXiv:1803.04831.
^ Goller, C.; Küchler, A. (1996). Learning task-dependent distributed representations by backpropagation through structure. 1. 347. doi:10.1109/ICNN.1996.548916. ISBN 978-0-7803-3210-2
^ Seppo Linnainmaa (1970). The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master's Thesis (in Finnish), Univ. Helsinki, 6-7.
^ Griewank, Andreas; Walther, Andrea (2008). Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation (Second ed.). SIAM. ISBN 978-0-89871-776-1
^ Socher, Richard; Lin, Cliff; Ng, Andrew Y.; Manning, Christopher D., “Parsing Natural Scenes and Natural Language with Recursive Neural Networks”, 28th International Conference on Machine Learning (ICML 2011)
^ Socher, Richard; Perelygin, Alex; Y. Wu, Jean; Chuang, Jason; D. Manning, Christopher; Y. Ng, Andrew; Potts, Christopher. “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank”. Emnlp 2013.
^ ^a ^b ^c ^d Schmidhuber, Jürgen (1992). “Learning complex, extended sequences using the principle of history compression”. Neural Computation 4 (2): 234–242. doi:10.1162/neco.1992.4.2.234.
^ Schmidhuber, Jürgen (2015). “Deep Learning”. Scholarpedia 10 (11): 32832. Bibcode: 2015SchpJ..1032832S. doi:10.4249/scholarpedia.32832.
^ ^a ^b ^c Sepp Hochreiter (1991), Untersuchungen zu dynamischen neuronalen Netzen, Diploma thesis. Institut f. Informatik, Technische Univ. Munich. Advisor: J. Schmidhuber.
^ C.L. Giles, C.B. Miller, D. Chen, H.H. Chen, G.Z. Sun, Y.C. Lee, "Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks", Neural Computation, 4(3), p. 393, 1992.
^ C.W. Omlin, C.L. Giles, "Constructing Deterministic Finite-State Automata in Recurrent Neural Networks" Journal of the ACM, 45(6), 937-972, 1996.
^ Gers, Felix; Schraudolph, Nicol N.; Schmidhuber, Jürgen (2000). “Learning Precise Timing with LSTM Recurrent Networks (PDF Download Available)”. Crossref Listing of Deleted Dois 1. doi:10.1162/153244303768966139 2019年4月5日閲覧。.
^ Bayer, Justin; Wierstra, Daan; Togelius, Julian; Schmidhuber, Jürgen (2009-09-14). Evolving Memory Cell Structures for Sequence Learning. Lecture Notes in Computer Science. 5769. Springer, Berlin, Heidelberg. 755–764. doi:10.1007/978-3-642-04277-5_76. ISBN 978-3-642-04276-8
^ Fernández, Santiago; Graves, Alex; Schmidhuber, Jürgen (2007). “Sequence labelling in structured domains with hierarchical recurrent neural networks”. Proc. 20th Int. Joint Conf. On Artificial In℡ligence, Ijcai 2007: 774–779.
^ Graves, Alex; Fernández, Santiago; Gomez, Faustino (2006). “Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks”. In Proceedings of the International Conference on Machine Learning, ICML 2006: 369–376.
^ Gers, F. A.; Schmidhuber, E. (November 2001). “LSTM recurrent networks learn simple context-free and context-sensitive languages”. IEEE Transactions on Neural Networks 12 (6): 1333–1340. doi:10.1109/72.963769. ISSN 1045-9227. PMID 18249962.
^ Heck, Joel; Salem, Fathi M. (12 January 2017). "Simplified Minimal Gated Unit Variations for Recurrent Neural Networks". arXiv:1701.03452 [cs.NE]。
^ Dey, Rahul; Salem, Fathi M. (20 January 2017). "Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks". arXiv:1701.05923 [cs.NE]。
^ Chung, Junyoung; Gulcehre, Caglar; Cho, KyungHyun; Bengio, Yoshua (2014). "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling". arXiv:1412.3555 [cs.NE]。
^ “Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano – WildML” (2015年10月27日). 2019年4月5日閲覧。
^ Graves, Alex; Schmidhuber, Jürgen (2005-07-01). “Framewise phoneme classification with bidirectional LSTM and other neural network architectures”. Neural Networks. IJCNN 2005 18 (5): 602–610. doi:10.1016/j.neunet.2005.06.042. PMID 16112549.
^ Thireou, T.; Reczko, M. (July 2007). “Bidirectional Long Short-Term Memory Networks for Predicting the Subcellular Localization of Eukaryotic Proteins”. IEEE/ACM Transactions on Computational Biology and Bioinformatics 4 (3): 441–446. doi:10.1109/tcbb.2007.1015.
^ Harvey, Inman; Husbands, P.; Cliff, D. (1994), “Seeing the light: Artificial evolution, real vision”, 3rd international conference on Simulation of adaptive behavior: from animals to animats 3, pp. 392–401
^ Quinn, Matthew (2001). Evolving communication without dedicated communication channels. Lecture Notes in Computer Science. 2159. 357–366. doi:10.1007/3-540-44811-X_38. ISBN 978-3-540-42567-0
^ Beer, R.D. (1997). “The dynamics of adaptive behavior: A research program”. Robotics and Autonomous Systems 20 (2–4): 257–289. doi:10.1016/S0921-8890(96)00063-2.
^ Paine, Rainer W.; Tani, Jun (2005-09-01). “How Hierarchical Control Self-organizes in Artificial Adaptive Systems”. Adaptive Behavior 13 (3): 211–225. doi:10.1177/105971230501300303.
^ Recurrent Multilayer Perceptrons for Identification and Control: The Road to Applications. (1995)
^ Yamashita, Yuichi; Tani, Jun (2008-11-07). “Emergence of Functional Hierarchy in a Multiple Timescale Neural Network Model: A Humanoid Robot Experiment”. PLOS Computational Biology 4 (11): e1000220. Bibcode: 2008PLSCB...4E0220Y. doi:10.1371/journal.pcbi.1000220. PMC 2570613. PMID 18989398.
^ Shibata Alnajjar, Fady; Yamashita, Yuichi; Tani, Jun (2013). “The hierarchical and functional connectivity of higher-order cognitive mechanisms: neurorobotic model to investigate the stability and flexibility of working memory”. Frontiers in Neurorobotics 7: 2. doi:10.3389/fnbot.2013.00002. PMC 3575058. PMID 23423881.
^ Graves, Alex; Wayne, Greg; Danihelka, Ivo (2014). "Neural Turing Machines". arXiv:1410.5401 [cs.NE]。
^ Sun, Guo-Zheng; Giles, C. Lee; Chen, Hsing-Hen (1998). “The Neural Network Pushdown Automaton: Architecture, Dynamics and Training”. In Giles, C. Lee. Adaptive Processing of Sequences and Data Structures. Lecture Notes in Computer Science. Springer Berlin Heidelberg. pp. 296–345. doi:10.1007/bfb0054003. ISBN 9783540643418
^ "letting μ be the value of the recurrent weight, and assuming for simplicity that the units are linear ..., the activation of the output unit at time t is given by $x_{2}(t)=\mu x_{2}(t-1)+w_{21}x_{1}(t)$ " Jordan, M.I. (1986). Serial order: A parallel distributed processing approach. (Tech. Rep. No. 8604). San Diego: University of California, Institute for Cognitive Science.
^ "popular RNN models are nonlinear sequence models with activation functions between each time step." Albert Gu, et al. (2021). Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers.
^ Albert Gu, et al. (2020). HiPPO: Recurrent Memory with Optimal Polynomial Projections. NeurIPS 2020.
^ "LSSLs are recurrent. ... LSSL can be discretized into a linear recurrence ... as a stateful recurrent model" Albert Gu, et al. (2021). Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers.
^ Albert Gu, et al. (2021). Efficiently Modeling Long Sequences with Structured State Spaces.
^ Werbos, Paul J. (1988). “Generalization of backpropagation with application to a recurrent gas market model”. Neural Networks 1 (4): 339–356. doi:10.1016/0893-6080(88)90007-x.
^ Rumelhart, David E. (1985). Learning Internal Representations by Error Propagation. Institute for Cognitive Science, University of California, San Diego
^ Robinson, A. J. (1987). The Utility Driven Dynamic Error Propagation Network. Technical Report CUED/F-INFENG/TR.1. University of Cambridge Department of Engineering
^ Williams, R. J.; Zipser. Gradient-based learning algorithms for recurrent networks and their computational complexity, D. (1 February 2013). Backpropagation: Theory, Architectures, and Applications. Psychology Press. ISBN 978-1-134-77581-1
^ SCHMIDHUBER, JURGEN (1989-01-01). “A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks”. Connection Science 1 (4): 403–412. doi:10.1080/09540098908915650.
^ Príncipe, José C.; Euliano, Neil R.; Lefebvre, W. Curt (2000). Neural and adaptive systems: fundamentals through simulations. Wiley. ISBN 978-0-471-35167-2
^ Yann, Ollivier; Corentin, Tallec; Guillaume, Charpiat (28 July 2015). "Training recurrent networks online without backtracking". arXiv:1507.07680 [cs.NE]。
^ Schmidhuber, Jürgen (1992-03-01). “A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks”. Neural Computation 4 (2): 243–248. doi:10.1162/neco.1992.4.2.243.
^ Williams, R. J. (1989). Complexity of exact gradient computation algorithms for recurrent neural networks. Technical Report Technical Report NU-CCS-89-27. Boston: Northeastern University, College of Computer Science.
^ Pearlmutter, Barak A. (1989-06-01). “Learning State Space Trajectories in Recurrent Neural Networks”. Neural Computation 1 (2): 263–269. doi:10.1162/neco.1989.1.2.263.
^ Hochreiter, S. (15 January 2001). “Gradient flow in recurrent nets: the difficulty of learning long-term dependencies”. A Field Guide to Dynamical Recurrent Networks. John Wiley & Sons. ISBN 978-0-7803-5369-5
^ Hochreiter, Sepp; Schmidhuber, Jürgen (1997-11-01). “Long Short-Term Memory”. Neural Computation 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735.
^ Campolucci; Uncini, A.; Piazza, F.; Rao, B. D. (1999). “On-Line Learning Algorithms for Locally Recurrent Neural Networks”. IEEE Transactions on Neural Networks 10 (2): 253–271. doi:10.1109/72.750549. PMID 18252525.
^ Wan, E. A.; Beaufays, F. (1996). “Diagrammatic derivation of gradient algorithms for neural networks”. Neural Computation 8: 182–201. doi:10.1162/neco.1996.8.1.182.
^ ^a ^b Campolucci, P.; Uncini, A.; Piazza, F. (2000). “A Signal-Flow-Graph Approach to On-line Gradient Calculation”. Neural Computation 12 (8): 1901–1927. doi:10.1162/089976600300015196.
^ Gomez, F. J.; Miikkulainen, R. (1999), “Solving non-Markovian control tasks with neuroevolution”, IJCAI 99, Morgan Kaufmann 2019年4月5日閲覧。
^ “Applying Genetic Algorithms to Recurrent Neural Networks for Learning Network Parameters and Architecture”. 2019年4月5日閲覧。
^ Gomez, Faustino; Schmidhuber, Jürgen; Miikkulainen, Risto (June 2008). “Accelerated Neural Evolution Through Cooperatively Coevolved Synapses”. J. Mach. Learn. Res. 9: 937–965.
^ "Copying task. This standard RNN task ... directly tests memorization, where models must regurgitate a sequence of tokens seen at the beginning of the sequence." Gu, et al. (2020). HiPPO: Recurrent Memory with Optimal Polynomial Projections.
^ " the first 10 tokens (a0, a1, . . . , a9) are randomly chosen from {1, . . . , 8}, the middle N tokens are set to 0, and the last ten tokens are 9. The goal of the recurrent model is to output (a0, . . . , a9) in order on the last 10 time steps, whenever the cue token 9 is presented." Gu, et al. (2020). HiPPO: Recurrent Memory with Optimal Polynomial Projections.
^ "ability to recall exactly data seen a long time ago." Arjovsky, et al. (2015). Unitary Evolution Recurrent Neural Networks.
^ Figure 1 of Arjovsky, et al. (2015). Unitary Evolution Recurrent Neural Networks.
^ Figure 7 of Gu, et al. (2020). HiPPO: Recurrent Memory with Optimal Polynomial Projections.
^ Siegelmann, Hava T.; Horne, Bill G.; Giles, C. Lee (1995). Computational Capabilities of Recurrent NARX Neural Networks. University of Maryland
^ Mayer, H.; Gomez, F.; Wierstra, D.; Nagy, I.; Knoll, A.; Schmidhuber, J. (October 2006). A System for Robotic Heart Surgery that Learns to Tie Knots Using Recurrent Neural Networks. 543–548. doi:10.1109/IROS.2006.282190. ISBN 978-1-4244-0258-8
^ Wierstra, Daan; Schmidhuber, J.; Gomez, F. J. (2005). “Evolino: Hybrid Neuroevolution/Optimal Linear Search for Sequence Learning”. Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh: 853–858. https://www.academia.edu/5830256.
^ Graves, A.; Schmidhuber, J. (2005). “Framewise phoneme classification with bidirectional LSTM and other neural network architectures”. Neural Networks 18 (5–6): 602–610. doi:10.1016/j.neunet.2005.06.042. PMID 16112549.
^ Fernández, Santiago; Graves, Alex; Schmidhuber, Jürgen (2007). An Application of Recurrent Neural Networks to Discriminative Keyword Spotting. ICANN'07. Berlin, Heidelberg: Springer-Verlag. 220–229. ISBN 978-3540746935
^ Graves, Alex; Mohamed, Abdel-rahman; Hinton, Geoffrey (2013). “Speech Recognition with Deep Recurrent Neural Networks”. Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on: 6645–6649.
^ Malhotra, Pankaj; Vig, Lovekesh; Shroff, Gautam; Agarwal, Puneet (April 2015). “Long Short Term Memory Networks for Anomaly Detection in Time Series”. European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning — ESANN 2015.
^ Gers, F.; Schraudolph, N.; Schmidhuber, J. (2002). “Learning precise timing with LSTM recurrent networks”. Journal of Machine Learning Research 3: 115–143.
^ Eck, Douglas; Schmidhuber, Jürgen (2002-08-28). Learning the Long-Term Structure of the Blues. Lecture Notes in Computer Science. 2415. Springer, Berlin, Heidelberg. 284–289. doi:10.1007/3-540-46084-5_47. ISBN 978-3540460848
^ Schmidhuber, J.; Gers, F.; Eck, D.; Schmidhuber, J.; Gers, F. (2002). “Learning nonregular languages: A comparison of simple recurrent networks and LSTM”. Neural Computation 14 (9): 2039–2041. doi:10.1162/089976602320263980. PMID 12184841.
^ Gers, F. A.; Schmidhuber, J. (2001). “LSTM Recurrent Networks Learn Simple Context Free and Context Sensitive Languages”. IEEE Transactions on Neural Networks 12 (6): 1333–1340. doi:10.1109/72.963769. PMID 18249962.
^ Perez-Ortiz, J. A.; Gers, F. A.; Eck, D.; Schmidhuber, J. (2003). “Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets”. Neural Networks 16 (2): 241–250. doi:10.1016/s0893-6080(02)00219-8.
^ A. Graves, J. Schmidhuber. Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks. Advances in Neural Information Processing Systems 22, NIPS'22, pp 545–552, Vancouver, MIT Press, 2009.
^ Graves, Alex; Fernández, Santiago; Liwicki, Marcus; Bunke, Horst; Schmidhuber, Jürgen (2007). Unconstrained Online Handwriting Recognition with Recurrent Neural Networks. NIPS'07. USA: Curran Associates Inc.. 577–584. ISBN 9781605603520
^ M. Baccouche, F. Mamalet, C Wolf, C. Garcia, A. Baskurt. Sequential Deep Learning for Human Action Recognition. 2nd International Workshop on Human Behavior Understanding (HBU), A.A. Salah, B. Lepri ed. Amsterdam, Netherlands. pp. 29–39. Lecture Notes in Computer Science 7065. Springer. 2011
^ Hochreiter, S.; Heusel, M.; Obermayer, K. (2007). “Fast model-based protein homology detection without alignment”. Bioinformatics 23 (14): 1728–1736. doi:10.1093/bioinformatics/btm247. PMID 17488755.
^ Thireou, T.; Reczko, M. (2007). “Bidirectional Long Short-Term Memory Networks for predicting the subcellular localization of eukaryotic proteins”. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 4 (3): 441–446. doi:10.1109/tcbb.2007.1015. PMID 17666763.
^ Tax, N.; Verenich, I.; La Rosa, M.; Dumas, M. (2017). Predictive Business Process Monitoring with LSTM neural networks. Lecture Notes in Computer Science. 10253. 477–492. arXiv:1612.02130. doi:10.1007/978-3-319-59536-8_30. ISBN 978-3-319-59535-1
^ Choi, E.; Bahadori, M.T.; Schuetz, E.; Stewart, W.; Sun, J. (2016). “Doctor AI: Predicting Clinical Events via Recurrent Neural Networks”. Proceedings of the 1st Machine Learning for Healthcare Conference: 301–318.

参考文献[編集]

岡谷貴之『深層学習』講談社〈機械学習プロフェッショナルシリーズ〉、2015年。ISBN 978-4061529021。
Mandic, D. & Chambers, J. (2001). Recurrent Neural Networks for Prediction: Learning Algorithms, Architectures and Stability. Wiley. ISBN 0-471-49517-4

外部リンク[編集]

RNNSharp 回帰型ニューラルネットワークに基づく条件付き確率場 (C#, .NET)
Recurrent Neural Networks Dalle Molle人工知能研究所(Dalle Molle Institute for Artificial Intelligence Research)ユルゲン・シュミットフーバーのグループによる60以上のRNNに関する論文集
Elman Neural Network implementation for WEKA
Recurrent Neural Nets & LSTMs in Java

[:0-1] "If a network has one or more cycles, that is, if it is possible to follow a path from a unit back to itself, then the network is referred to as recurrent." Jordan, M.I. (1986). Serial order: A parallel distributed processing approach. (Tech. Rep. No. 8604). San Diego: University of California, Institute for Cognitive Science.

[2] Jinyu Li Li Deng Reinhold Haeb-Umbach Yifan Gong (2015). Robust Automatic Speech Recognition. Academic Press. ISBN 978-0128023983

[3] Graves, A.; Liwicki, M.; Fernandez, S.; Bertolami, R.; Bunke, H.; Schmidhuber, J. (2009). “A Novel Connectionist System for Improved Unconstrained Handwriting Recognition” (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (5).

[sak2014-4] Sak, Hasim (2014年). “Long Short-Term Memory recurrent neural network architectures for large scale acoustic modeling”. 2019年4月5日閲覧。

[liwu2015-5] Li, Xiangang; Wu, Xihong (15 October 2014). "Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition". arXiv:1410.4281 [cs.CL]。

[6] Miljanovic, Milos (Feb-Mar 2012). “Comparative analysis of Recurrent and Finite Impulse Response Neural Networks in Time Series Prediction”. Indian Journal of Computer and Engineering 3 (1).

[okatani-7] 岡谷 2015, pp. 112

[watanabe-8] 渡辺太郎「ニューラルネットワークによる構造学習の発展」『人工知能』第31巻第2号、202--209頁、NAID 110010039602。

[9] Williams, Ronald J.; Hinton, Geoffrey E.; Rumelhart, David E. (October 1986). “Learning representations by back-propagating errors”. Nature 323 (6088): 533–536. doi:10.1038/323533a0. ISSN 1476-4687.

[schmidhuber1993-10] Schmidhuber, Jürgen (1993). Habilitation thesis: System modeling and optimization Page 150 ff demonstrates credit assignment across the equivalent of 1,200 layers in an unfolded RNN.

[fernandez2007keyword-11] Fernández, Santiago; Graves, Alex; Schmidhuber, Jürgen (2007). “An Application of Recurrent Neural Networks to Discriminative Keyword Spotting”. Proceedings of the 17th International Conference on Artificial Neural Networks. ICANN'07 (Berlin, Heidelberg: Springer-Verlag): 220–229. ISBN 978-3-540-74693-5.

[schmidhuber2015-12] Schmidhuber, Jürgen (January 2015). “Deep Learning in Neural Networks: An Overview”. Neural Networks 61: 85–117. arXiv:1404.7828. doi:10.1016/j.neunet.2014.09.003. PMID 25462637.

[graves20093-13] Graves, Alex; Schmidhuber, Jürgen (2009). Bengio, Yoshua. ed. “Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks”. Neural Information Processing Systems (NIPS) Foundation: 545–552.

[hannun2014-14] Hannun, Awni; Case, Carl; Casper, Jared; Catanzaro, Bryan; Diamos, Greg; Elsen, Erich; Prenger, Ryan; Satheesh, Sanjeev; Sengupta, Shubho (17 December 2014). "Deep Speech: Scaling up end-to-end speech recognition". arXiv:1412.5567 [cs.CL]。

[fan2015-15] Bo Fan, Lijuan Wang, Frank K. Soong, and Lei Xie (2015). Photo-Real Talking Head with Deep Bidirectional LSTM. In Proceedings of ICASSP 2015.

[zen2015-16] Zen, Heiga (2015年). “Unidirectional Long Short-Term Memory Recurrent Neural Network with Recurrent Output Layer for Low-Latency Speech Synthesis”. Google.com. ICASSP. pp. 4470–4474. 2019年4月5日閲覧。

[sak2015-17] Sak, Haşim (2015年9月). “Google voice search: faster and more accurate”. 2019年4月5日閲覧。

[sutskever2014-18] Sutskever, L.; Vinyals, O.; Le, Q. (2014). “Sequence to Sequence Learning with Neural Networks”. Electronic Proceedings of the Neural Information Processing Systems Conference 27: 5346. arXiv:1409.3215. Bibcode: 2014arXiv1409.3215S.

[vinyals2016-19] Jozefowicz, Rafal; Vinyals, Oriol; Schuster, Mike; Shazeer, Noam; Wu, Yonghui (7 February 2016). "Exploring the Limits of Language Modeling". arXiv:1602.02410 [cs.CL]。

[gillick2015-20] Gillick, Dan; Brunk, Cliff; Vinyals, Oriol; Subramanya, Amarnag (30 November 2015). "Multilingual Language Processing From Bytes". arXiv:1512.00103 [cs.CL]。

[vinyals2015-21] Vinyals, Oriol; Toshev, Alexander; Bengio, Samy; Erhan, Dumitru (17 November 2014). "Show and Tell: A Neural Image Caption Generator". arXiv:1411.4555 [cs.CV]。

[bmm615-22] Cruse, Holk; Neural Networks as Cybernetic Systems, 2nd and revised edition

[23] Elman, Jeffrey L. (1990). “Finding Structure in Time”. Cognitive Science 14 (2): 179–211. doi:10.1016/0364-0213(90)90002-E.

[24] Jordan, Michael I. (1997-01-01). Serial Order: A Parallel Distributed Processing Approach. Neural-Network Models of Cognition. 121. 471–495. doi:10.1016/s0166-4115(97)80111-2. ISBN 9780444819314

[25] Kosko, B. (1988). “Bidirectional associative memories”. IEEE Transactions on Systems, Man, and Cybernetics 18 (1): 49–60. doi:10.1109/21.87054.

[26] Rakkiyappan, R.; Chandrasekar, A.; Lakshmanan, S.; Park, Ju H. (2 January 2015). “Exponential stability for markovian jumping stochastic BAM neural networks with mode-dependent probabilistic time-varying delays and impulse control”. Complexity 20 (3): 39–65. Bibcode: 2015Cmplx..20c..39R. doi:10.1002/cplx.21503.

[27] Rául Rojas (1996). Neural networks: a systematic introduction. Springer. p. 336. ISBN 978-3-540-60505-8

[28] Jaeger, Herbert; Haas, Harald (2004-04-02). “Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication”. Science 304 (5667): 78–80. Bibcode: 2004Sci...304...78J. doi:10.1126/science.1091277. PMID 15064413.

[29] W. Maass, T. Natschläger, and H. Markram (2002). “A fresh look at real-time computation in generic recurrent neural circuits”. Technical report, Institute for Theoretical Computer Science (TU Graz).

[auto-30] Li, Shuai; Li, Wanqing; Cook, Chris; Zhu, Ce; Yanbo, Gao (2018). “Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN”. IEEE Conference on Computer Vision and Pattern Recognition. arXiv:1803.04831.

[31] Goller, C.; Küchler, A. (1996). Learning task-dependent distributed representations by backpropagation through structure. 1. 347. doi:10.1109/ICNN.1996.548916. ISBN 978-0-7803-3210-2

[lin1970-32] Seppo Linnainmaa (1970). The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master's Thesis (in Finnish), Univ. Helsinki, 6-7.

[grie2008-33] Griewank, Andreas; Walther, Andrea (2008). Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation (Second ed.). SIAM. ISBN 978-0-89871-776-1

[34] Socher, Richard; Lin, Cliff; Ng, Andrew Y.; Manning, Christopher D., “Parsing Natural Scenes and Natural Language with Recursive Neural Networks”, 28th International Conference on Machine Learning (ICML 2011)

[35] Socher, Richard; Perelygin, Alex; Y. Wu, Jean; Chuang, Jason; D. Manning, Christopher; Y. Ng, Andrew; Potts, Christopher. “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank”. Emnlp 2013.

[schmidhuber1992-36] Schmidhuber, Jürgen (1992). “Learning complex, extended sequences using the principle of history compression”. Neural Computation 4 (2): 234–242. doi:10.1162/neco.1992.4.2.234.

[scholarpedia2015pre-37] Schmidhuber, Jürgen (2015). “Deep Learning”. Scholarpedia 10 (11): 32832. Bibcode: 2015SchpJ..1032832S. doi:10.4249/scholarpedia.32832.

[hochreiter1991-38] Sepp Hochreiter (1991), Untersuchungen zu dynamischen neuronalen Netzen, Diploma thesis. Institut f. Informatik, Technische Univ. Munich. Advisor: J. Schmidhuber.

[39] C.L. Giles, C.B. Miller, D. Chen, H.H. Chen, G.Z. Sun, Y.C. Lee, "Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks", Neural Computation, 4(3), p. 393, 1992.

[40] C.W. Omlin, C.L. Giles, "Constructing Deterministic Finite-State Automata in Recurrent Neural Networks" Journal of the ACM, 45(6), 937-972, 1996.

[gers2002-41] Gers, Felix; Schraudolph, Nicol N.; Schmidhuber, Jürgen (2000). “Learning Precise Timing with LSTM Recurrent Networks (PDF Download Available)”. Crossref Listing of Deleted Dois 1. doi:10.1162/153244303768966139 2019年4月5日閲覧。.

[bayer2009-42] Bayer, Justin; Wierstra, Daan; Togelius, Julian; Schmidhuber, Jürgen (2009-09-14). Evolving Memory Cell Structures for Sequence Learning. Lecture Notes in Computer Science. 5769. Springer, Berlin, Heidelberg. 755–764. doi:10.1007/978-3-642-04277-5_76. ISBN 978-3-642-04276-8

[fernandez2007-43] Fernández, Santiago; Graves, Alex; Schmidhuber, Jürgen (2007). “Sequence labelling in structured domains with hierarchical recurrent neural networks”. Proc. 20th Int. Joint Conf. On Artificial In℡ligence, Ijcai 2007: 774–779.

[graves2006-44] Graves, Alex; Fernández, Santiago; Gomez, Faustino (2006). “Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks”. In Proceedings of the International Conference on Machine Learning, ICML 2006: 369–376.

[45] Gers, F. A.; Schmidhuber, E. (November 2001). “LSTM recurrent networks learn simple context-free and context-sensitive languages”. IEEE Transactions on Neural Networks 12 (6): 1333–1340. doi:10.1109/72.963769. ISSN 1045-9227. PMID 18249962.

[46] Heck, Joel; Salem, Fathi M. (12 January 2017). "Simplified Minimal Gated Unit Variations for Recurrent Neural Networks". arXiv:1701.03452 [cs.NE]。

[47] Dey, Rahul; Salem, Fathi M. (20 January 2017). "Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks". arXiv:1701.05923 [cs.NE]。

[MyUser_Arxiv.org_May_18_2016c-48] Chung, Junyoung; Gulcehre, Caglar; Cho, KyungHyun; Bengio, Yoshua (2014). "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling". arXiv:1412.3555 [cs.NE]。

[MyUser_Wildml.com_May_18_2016c-49] “Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano – WildML” (2015年10月27日). 2019年4月5日閲覧。

[50] Graves, Alex; Schmidhuber, Jürgen (2005-07-01). “Framewise phoneme classification with bidirectional LSTM and other neural network architectures”. Neural Networks. IJCNN 2005 18 (5): 602–610. doi:10.1016/j.neunet.2005.06.042. PMID 16112549.

[51] Thireou, T.; Reczko, M. (July 2007). “Bidirectional Long Short-Term Memory Networks for Predicting the Subcellular Localization of Eukaryotic Proteins”. IEEE/ACM Transactions on Computational Biology and Bioinformatics 4 (3): 441–446. doi:10.1109/tcbb.2007.1015.

[52] Harvey, Inman; Husbands, P.; Cliff, D. (1994), “Seeing the light: Artificial evolution, real vision”, 3rd international conference on Simulation of adaptive behavior: from animals to animats 3, pp. 392–401

[Evolving_communication_without_dedicated_communication_channels-53] Quinn, Matthew (2001). Evolving communication without dedicated communication channels. Lecture Notes in Computer Science. 2159. 357–366. doi:10.1007/3-540-44811-X_38. ISBN 978-3-540-42567-0

[The_dynamics_of_adaptive_behavior:_A_research_program-54] Beer, R.D. (1997). “The dynamics of adaptive behavior: A research program”. Robotics and Autonomous Systems 20 (2–4): 257–289. doi:10.1016/S0921-8890(96)00063-2.

[55] Paine, Rainer W.; Tani, Jun (2005-09-01). “How Hierarchical Control Self-organizes in Artificial Adaptive Systems”. Adaptive Behavior 13 (3): 211–225. doi:10.1177/105971230501300303.

[56] Recurrent Multilayer Perceptrons for Identification and Control: The Road to Applications. (1995)

[57] Yamashita, Yuichi; Tani, Jun (2008-11-07). “Emergence of Functional Hierarchy in a Multiple Timescale Neural Network Model: A Humanoid Robot Experiment”. PLOS Computational Biology 4 (11): e1000220. Bibcode: 2008PLSCB...4E0220Y. doi:10.1371/journal.pcbi.1000220. PMC 2570613. PMID 18989398.

[58] Shibata Alnajjar, Fady; Yamashita, Yuichi; Tani, Jun (2013). “The hierarchical and functional connectivity of higher-order cognitive mechanisms: neurorobotic model to investigate the stability and flexibility of working memory”. Frontiers in Neurorobotics 7: 2. doi:10.3389/fnbot.2013.00002. PMC 3575058. PMID 23423881.

[59] Graves, Alex; Wayne, Greg; Danihelka, Ivo (2014). "Neural Turing Machines". arXiv:1410.5401 [cs.NE]。

[60] Sun, Guo-Zheng; Giles, C. Lee; Chen, Hsing-Hen (1998). “The Neural Network Pushdown Automaton: Architecture, Dynamics and Training”. In Giles, C. Lee. Adaptive Processing of Sequences and Data Structures. Lecture Notes in Computer Science. Springer Berlin Heidelberg. pp. 296–345. doi:10.1007/bfb0054003. ISBN 9783540643418

[61] "letting μ be the value of the recurrent weight, and assuming for simplicity that the units are linear ..., the activation of the output unit at time t is given by $x_{2}(t)=\mu x_{2}(t-1)+w_{21}x_{1}(t)$ " Jordan, M.I. (1986). Serial order: A parallel distributed processing approach. (Tech. Rep. No. 8604). San Diego: University of California, Institute for Cognitive Science.

[62] "popular RNN models are nonlinear sequence models with activation functions between each time step." Albert Gu, et al. (2021). Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers.

[63] Albert Gu, et al. (2020). HiPPO: Recurrent Memory with Optimal Polynomial Projections. NeurIPS 2020.

[64] "LSSLs are recurrent. ... LSSL can be discretized into a linear recurrence ... as a stateful recurrent model" Albert Gu, et al. (2021). Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers.

[65] Albert Gu, et al. (2021). Efficiently Modeling Long Sequences with Structured State Spaces.

[66] Werbos, Paul J. (1988). “Generalization of backpropagation with application to a recurrent gas market model”. Neural Networks 1 (4): 339–356. doi:10.1016/0893-6080(88)90007-x.

[67] Rumelhart, David E. (1985). Learning Internal Representations by Error Propagation. Institute for Cognitive Science, University of California, San Diego

[68] Robinson, A. J. (1987). The Utility Driven Dynamic Error Propagation Network. Technical Report CUED/F-INFENG/TR.1. University of Cambridge Department of Engineering

[69] Williams, R. J.; Zipser. Gradient-based learning algorithms for recurrent networks and their computational complexity, D. (1 February 2013). Backpropagation: Theory, Architectures, and Applications. Psychology Press. ISBN 978-1-134-77581-1

[70] SCHMIDHUBER, JURGEN (1989-01-01). “A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks”. Connection Science 1 (4): 403–412. doi:10.1080/09540098908915650.

[PríncipeEuliano2000-71] Príncipe, José C.; Euliano, Neil R.; Lefebvre, W. Curt (2000). Neural and adaptive systems: fundamentals through simulations. Wiley. ISBN 978-0-471-35167-2

[Ollivier2015-72] Yann, Ollivier; Corentin, Tallec; Guillaume, Charpiat (28 July 2015). "Training recurrent networks online without backtracking". arXiv:1507.07680 [cs.NE]。

[73] Schmidhuber, Jürgen (1992-03-01). “A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks”. Neural Computation 4 (2): 243–248. doi:10.1162/neco.1992.4.2.243.

[74] Williams, R. J. (1989). Complexity of exact gradient computation algorithms for recurrent neural networks. Technical Report Technical Report NU-CCS-89-27. Boston: Northeastern University, College of Computer Science.

[75] Pearlmutter, Barak A. (1989-06-01). “Learning State Space Trajectories in Recurrent Neural Networks”. Neural Computation 1 (2): 263–269. doi:10.1162/neco.1989.1.2.263.

[HOCH2001-76] Hochreiter, S. (15 January 2001). “Gradient flow in recurrent nets: the difficulty of learning long-term dependencies”. A Field Guide to Dynamical Recurrent Networks. John Wiley & Sons. ISBN 978-0-7803-5369-5

[lstm-77] Hochreiter, Sepp; Schmidhuber, Jürgen (1997-11-01). “Long Short-Term Memory”. Neural Computation 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735.

[78] Campolucci; Uncini, A.; Piazza, F.; Rao, B. D. (1999). “On-Line Learning Algorithms for Locally Recurrent Neural Networks”. IEEE Transactions on Neural Networks 10 (2): 253–271. doi:10.1109/72.750549. PMID 18252525.

[79] Wan, E. A.; Beaufays, F. (1996). “Diagrammatic derivation of gradient algorithms for neural networks”. Neural Computation 8: 182–201. doi:10.1162/neco.1996.8.1.182.

[ReferenceA-80] Campolucci, P.; Uncini, A.; Piazza, F. (2000). “A Signal-Flow-Graph Approach to On-line Gradient Calculation”. Neural Computation 12 (8): 1901–1927. doi:10.1162/089976600300015196.

[81] Gomez, F. J.; Miikkulainen, R. (1999), “Solving non-Markovian control tasks with neuroevolution”, IJCAI 99, Morgan Kaufmann 2019年4月5日閲覧。

[82] “Applying Genetic Algorithms to Recurrent Neural Networks for Learning Network Parameters and Architecture”. 2019年4月5日閲覧。

[83] Gomez, Faustino; Schmidhuber, Jürgen; Miikkulainen, Risto (June 2008). “Accelerated Neural Evolution Through Cooperatively Coevolved Synapses”. J. Mach. Learn. Res. 9: 937–965.

[84] "Copying task. This standard RNN task ... directly tests memorization, where models must regurgitate a sequence of tokens seen at the beginning of the sequence." Gu, et al. (2020). HiPPO: Recurrent Memory with Optimal Polynomial Projections.

[85] " the first 10 tokens (a0, a1, . . . , a9) are randomly chosen from {1, . . . , 8}, the middle N tokens are set to 0, and the last ten tokens are 9. The goal of the recurrent model is to output (a0, . . . , a9) in order on the last 10 time steps, whenever the cue token 9 is presented." Gu, et al. (2020). HiPPO: Recurrent Memory with Optimal Polynomial Projections.

[86] "ability to recall exactly data seen a long time ago." Arjovsky, et al. (2015). Unitary Evolution Recurrent Neural Networks.

[87] Figure 1 of Arjovsky, et al. (2015). Unitary Evolution Recurrent Neural Networks.

[88] Figure 7 of Gu, et al. (2020). HiPPO: Recurrent Memory with Optimal Polynomial Projections.

[89] Siegelmann, Hava T.; Horne, Bill G.; Giles, C. Lee (1995). Computational Capabilities of Recurrent NARX Neural Networks. University of Maryland

[90] Mayer, H.; Gomez, F.; Wierstra, D.; Nagy, I.; Knoll, A.; Schmidhuber, J. (October 2006). A System for Robotic Heart Surgery that Learns to Tie Knots Using Recurrent Neural Networks. 543–548. doi:10.1109/IROS.2006.282190. ISBN 978-1-4244-0258-8

[91] Wierstra, Daan; Schmidhuber, J.; Gomez, F. J. (2005). “Evolino: Hybrid Neuroevolution/Optimal Linear Search for Sequence Learning”. Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh: 853–858. https://www.academia.edu/5830256.

[92] Graves, A.; Schmidhuber, J. (2005). “Framewise phoneme classification with bidirectional LSTM and other neural network architectures”. Neural Networks 18 (5–6): 602–610. doi:10.1016/j.neunet.2005.06.042. PMID 16112549.

[93] Fernández, Santiago; Graves, Alex; Schmidhuber, Jürgen (2007). An Application of Recurrent Neural Networks to Discriminative Keyword Spotting. ICANN'07. Berlin, Heidelberg: Springer-Verlag. 220–229. ISBN 978-3540746935

[graves2013-94] Graves, Alex; Mohamed, Abdel-rahman; Hinton, Geoffrey (2013). “Speech Recognition with Deep Recurrent Neural Networks”. Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on: 6645–6649.

[95] Malhotra, Pankaj; Vig, Lovekesh; Shroff, Gautam; Agarwal, Puneet (April 2015). “Long Short Term Memory Networks for Anomaly Detection in Time Series”. European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning — ESANN 2015.

[peephole2002-96] Gers, F.; Schraudolph, N.; Schmidhuber, J. (2002). “Learning precise timing with LSTM recurrent networks”. Journal of Machine Learning Research 3: 115–143.

[97] Eck, Douglas; Schmidhuber, Jürgen (2002-08-28). Learning the Long-Term Structure of the Blues. Lecture Notes in Computer Science. 2415. Springer, Berlin, Heidelberg. 284–289. doi:10.1007/3-540-46084-5_47. ISBN 978-3540460848

[98] Schmidhuber, J.; Gers, F.; Eck, D.; Schmidhuber, J.; Gers, F. (2002). “Learning nonregular languages: A comparison of simple recurrent networks and LSTM”. Neural Computation 14 (9): 2039–2041. doi:10.1162/089976602320263980. PMID 12184841.

[peepholeLSTM-99] Gers, F. A.; Schmidhuber, J. (2001). “LSTM Recurrent Networks Learn Simple Context Free and Context Sensitive Languages”. IEEE Transactions on Neural Networks 12 (6): 1333–1340. doi:10.1109/72.963769. PMID 18249962.

[100] Perez-Ortiz, J. A.; Gers, F. A.; Eck, D.; Schmidhuber, J. (2003). “Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets”. Neural Networks 16 (2): 241–250. doi:10.1016/s0893-6080(02)00219-8.

[101] A. Graves, J. Schmidhuber. Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks. Advances in Neural Information Processing Systems 22, NIPS'22, pp 545–552, Vancouver, MIT Press, 2009.

[102] Graves, Alex; Fernández, Santiago; Liwicki, Marcus; Bunke, Horst; Schmidhuber, Jürgen (2007). Unconstrained Online Handwriting Recognition with Recurrent Neural Networks. NIPS'07. USA: Curran Associates Inc.. 577–584. ISBN 9781605603520

[103] M. Baccouche, F. Mamalet, C Wolf, C. Garcia, A. Baskurt. Sequential Deep Learning for Human Action Recognition. 2nd International Workshop on Human Behavior Understanding (HBU), A.A. Salah, B. Lepri ed. Amsterdam, Netherlands. pp. 29–39. Lecture Notes in Computer Science 7065. Springer. 2011

[104] Hochreiter, S.; Heusel, M.; Obermayer, K. (2007). “Fast model-based protein homology detection without alignment”. Bioinformatics 23 (14): 1728–1736. doi:10.1093/bioinformatics/btm247. PMID 17488755.

[105] Thireou, T.; Reczko, M. (2007). “Bidirectional Long Short-Term Memory Networks for predicting the subcellular localization of eukaryotic proteins”. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 4 (3): 441–446. doi:10.1109/tcbb.2007.1015. PMID 17666763.

[106] Tax, N.; Verenich, I.; La Rosa, M.; Dumas, M. (2017). Predictive Business Process Monitoring with LSTM neural networks. Lecture Notes in Computer Science. 10253. 477–492. arXiv:1612.02130. doi:10.1007/978-3-319-59536-8_30. ISBN 978-3-319-59535-1

[107] Choi, E.; Bahadori, M.T.; Schuetz, E.; Stewart, W.; Sun, J. (2016). “Doctor AI: Predicting Clinical Events via Recurrent Neural Networks”. Proceedings of the 1st Machine Learning for Healthcare Conference: 301–318.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[60]

[61]

[62]

[63]

[64]

[65]

[66]

[67]

[68]

[69]

[70]

[71]

[72]

[73]

[74]

[75]

[76]

[77]

[78]

[79]

[80]

[81]

[82]

[83]

[84]

[85]

[86]

[87]

[88]

[89]

[90]

[91]

[92]

[93]

[94]

[95]

[96]

[97]

[98]

[99]

[100]