LSTMs & GRUs

Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) are advanced RNN architectures designed to handle long-term dependencies in sequential data.

The Problem with Standard RNNs

Standard RNNs suffer from vanishing gradients, making it hard to learn long-range dependencies. LSTMs and GRUs solve this with gating mechanisms.

LSTM

  • Forget Gate: Decides what to discard
  • Input Gate: Decides what to add
  • Output Gate: Decides what to output
  • Cell State: Long-term memory

GRU

  • Update Gate: Controls information flow
  • Reset Gate: Decides what to forget
  • Simpler than LSTM (fewer parameters)
  • Often performs similarly to LSTM

Gate Mechanism Example

This demonstrates how gates control information flow (simplified).

python
Output:
Click "Run Code" to see output