# Introduction

- The paper proposes a new RNN Encoder-Decoder architecture that can improve the performance of statistical machine translation (SMT) systems.
- Link to the paper

# RNN Encoder-Decoder

- Model consists of two RNNs
- Encoder: Learns to encode a variable-length input sequence into a fixed-length vector representation.
- Decoder: Learns to decode a given fixed-length vector representation into a variable-length target sequence.
- Two networks are trained jointly to maximise the conditional probability of the target sequence given the input source sequence.
- Trained model can be used to:
- generate a target sequence, given an input sequence.
- score a given pair of input and output sequences.

**Hidden Unit that adaptively remembers and forgets.**

- Hidden unit updated to have a
- reset gate that adaptively
*drop*any hidden state information that it finds irrelevant. - update gate that controls how much information from the previous state to carry over.
- Each hidden unit has separate reset and update gates which improve the memory capacity and makes it easier to train.

# Statistical Machine Translation (SMT)

- In the phrase-based SMT framework, the translation model is factorised into the translation probabilities of matching phrases in the source and target sentences.
- RNN Encoder-Decoder can be used to rescore the phrase pairs in the phrase table

# Experiments

**Details**

- 1000 hidden units.
- Activation function in proposed hidden unit - hyperbolic tangent function
- Non-recurrent weights initialized by sampling from an isotropic Gaussian distribution (mean = 0, sd = 0.01)
- Recurrent weights initialized by sampling from white Gaussian distribution and using its left singular vectors.
- Adadelta and SGD

**Observations**

- Train the model to translate an English phrase to French phrase.
- Using the model to score phrase pairs in the standard phrase-based SMT system improves the translation performance.
- Train a CSLM (Continuous Space Language Model) and compare phrase scores from trained model with those given by CSLM.
- RNN Encoder–Decoder is better at capturing the linguistic regularities in the phrase table.
- RNN Encoder-Decoder learns a continuous space representation for phrases that preserves both the semantic and syntactic structure.