Paper Summary: Learning To Execute

- Evaluating if LSTMs can express and learn short, simple programs (linear time, constant memory) in the sequence-to-sequence framework.
- Link to paper

- Formulate program evaluation task as a sequence-to-sequence learning problem using RNNs.
- Train on short programs that can be evaluated in linear time and constant memory - RNN can perform only a single pass over the data and its memory is limited.
- Two parameters to control the difficulty of the program:
- length : Number of digits in the integer that appears in the program.
- nesting : Number of times operations can be combined with each other.
- LSTM reads the input program, one character at a time and produces output, one character at a time.

**Addition Task**- Given two numbers, the model learns to add them. This task becomes the baseline for comparing performance on other tasks.**Memorization Task**- Give a random number, the model memorizes it and outputs it. Following techniques enhance the accuracy of the model:**Input reversing**- Reversing the order of input, while keeping the output fixed introduces many short-term dependencies that help LSTM in learning the process.**Input doubling**- Presenting the same input to the network twice enhances the performance as the model gets to look at the input twice.

Gradually increase the difficulty of the program fed to the system.

**No Curriculum (baseline)**- Fixed length and fixed nesting programs are fed to the system.**Naive Curriculum**- Start with length = 1 and nesting = 1 and keep increasing the values iteratively.**Mix Strategy**- Randomly choose length and nesting to generate a mix of easy and difficult examples.**Combined Strategy**- Each training example is obtained either by Naive curriculum strategy or mix strategy.

- 2 layers, unrolled for 50 steps.
- 400 cells per layer.
- Parameters initialized uniformly in [-0.08, 0.08]
- minibatch size 100
- norm of gradient normalized to be less than 5
- start with learning rate = 0.5, further decreased by 0.8 after reaching target accuracy of 95%

Teacher forcing technique used for computing accuracy ie when predicting the ith digit, the correct first i-1 digits of the output are provided as input to the LSTM.

The general trend is (combine, mix) > (naive, baseline).

In certain cases for program evaluation, baseline performs better than naive curriculum strategy. Intuitively, the model would use all its memory to store patterns for a given size input. Now when a higher size input is provided, the model would have to restructure its memory patterns to learn the output for this new class of inputs. The process of memory restructuring may be causing the degraded performance of the naive strategy. The combined strategy combines the naive and mix strategy and hence reduces the need to restructure the memory patterns.

While LSTMs can learn to map the character level representation of simple programs to their correct output, the idea can not extend to arbitrary programs due to the runtime limitations of conventional RNNs and LSTM. Moreover, while learning is essential, the optimal curriculum strategy needs to be understood further.

Read more…(515 words)

About the author:

Shagun Sodhani

Loading…

Join the discussion. Add a reply…

Post

Table of contents

- Problem Statement
- Approach
- Additional Learning Tasks
- Curriculum learning
- Network Architecture
- Observations

About the author

Shagun Sodhani

Ready to join our community?

Sign up below to automatically get notified of new lists, get **reminders** to finish ones you subscribe to, and **bookmark** articles to read later.

Continue with Facebook

— OR —

Your Full Name

Email address

I have an account. Log in instead

By signing up, you agree to our Terms and our Privacy Policy.