Bidirectional model with forward (sentence given in correct order) and backward (sentence given in reverse order) encoders of 1200 dimensions each.
concatenation of uni-skip and bi-skip vectors.
Recurrent matricies - orthogonal initialization.
Non-recurrent matricies - uniform distribution in [-0.1,0.1].
Mini-batches of size 128.
Gradient Clipping at norm = 10.
After learning skip-thoughts, freeze the model and use the encoder as feature extractor only.
Evaluated the vectors with linear models on following tasks:
Given a sentence pair, predict how closely related the two sentences are.
skip-thoughts method outperforms all systems from SemEval 2014 competition and is outperformed only by dependency tree-LSTMs.
Using features learned from image-sentence embedding model on COCO boosts performance and brings it at par with dependency tree-LSTMs.
skip-thoughts outperforms recursive nets with dynamic pooling if no hand-crafted features are used.
skip-thoughts with basic pairwise statistics produce results comparable with the state-of-the-art systems that house complicated features and hand engineering.
MS COCO dataset
Given an image, rank the sentences on basis of how well they describe the image.
Image search - Given a caption, find the image that is being described.
Though the system does not outperform baseline system in all cases, the results does indicate that skip-thought vectors can capture image descriptions without having to learn their representations from scratch.
skip-thoughts perform about as good as bag-of-words baselines but are outperformed by methods where sentence representation has been learnt for the task at hand.
Combining skip-thoughts with bi-gram Naive Bayes (NB) features improves the performance.
Variants to be explored include:
Fine tuning the encoder-decoder model during the downstream task instead of freezing the weights.