Learning to Compute Word Embeddings On the Fly[ Edit ]
Word based language models suffer from the problem of rare or Out of Vocabulary (OOV) words.
Learning representations for OOV words directly on the end task often results in poor representation.
The alternative is to replace all the rare words with a single, unique representation (loss of information) or use character level models to obtain word representations (they tend to miss on the semantic relationship).
The paper proposes to learn a network that can predict the representations of words using auxiliary data (referred to as definitions) such as dictionary definitions, Wikipedia infoboxes, the spelling of the word etc.
The auxiliary data encoders are trained jointly with the end task to ensure that word representations align with the requirements of the end task.
Given a rare word w, let d(w) = <x1, x2…> denote its defination where xi are words.
d(w) is fed to a defination reader network f (LSTM) and its last state is used as the defination embedding ed(w)
In case w has multiple definitions, the embeddings are combined using mean pooling.
The approach can be extended to in-vocabulary words as well by using the definition embedding of such words to update their original embeddings.
Auxiliary data sources
Word definitions from WordNet
Spelling of words
The proposed approach was tested on following tasks: