Estimate conditional probability p(a|c, q), where c is a context document, q is a query related to the document, and a is the answer to that query.
Use online newspapers (CNN and DailyMail) and their matching summaries.
Parse summaries and bullet points into Cloze style questions.
Generate corpus of document-query-answer triplets by replacing one entity at a time with a placeholder.
Data anonymized and randomised using coreference systems, abstract entity markers and random permutation of the entity markers.
The processed data set is more focused in terms of evaluating reading comprehension as models can not exploit co-occurrence.
Picks the most frequently observed entity in the context document.
Picks the most frequently observed entity in the context document which is not observed in the query.
Symbolic Matching Models
Parse the sentence to find predicates to answer questions like "who did what to whom".
Extracting entity-predicate triples (e1,V, e2) from query q and context document d
Resolve queries using rules like exact match, matching entity etc.
Word Distance Benchmark
Align placeholder of Cloze form questions with each possible entity in the context document and calculate the distance between the question and the context around the aligned entity.
Sum the distance of every word in q to their nearest aligned word in d
Neural Network Models
Deep LSTM Reader
Test the ability of Deep LSTM encoders to handle significantly longer sequences.
Feed the document query pair as a single large document, one word at a time.
Use Deep LSTM cell with skip connections from input to hidden layers and hidden layer to output.
Employ attention model to overcome the bottleneck of fixed width hidden vector.
Encode the document and the query using separate bidirectional single layer LSTM.
Query encoding is obtained by concatenating the final forward and backwards outputs.
Document encoding is obtained by a weighted sum of output vectors (obtained by concatenating the forward and backwards outputs).
The weights can be interpreted as the degree to which the network attends to a particular token in the document.
Model completed by defining a non-linear combination of document and query embedding.
As an add-on to the attentive reader, the model can re-read the document as each query token is read.
Model accumulates the information from the document as each query token is seen and finally outputs a joint document query representation in the form of a non-linear combination of document embedding and query embedding.
Attentive and Impatient Readers outperform all other models highlighting the benefits of attention modelling.
Frame-Semantic pipeline does not scale to cases where several methods are needed to answer a query.
Moreover, they provide poor coverage as a lot of relations do not adhere to the default predicate-argument structure.
Word Distance approach outperformed the Frame-Semantic approach as there was significant lexical overlap between the query and the document.
The paper also includes heat maps over the context documents to visualise the attention mechanism.