In general deep learning models cannot be naturally adapted to find solutions to structured learning problems.
In this paper, they introduce a variant of deep recurrent neural networks, which can learn to parse a sentence by learning transitions in a shift-reduce parser. One of their main contributions is to batch this algorithm. In spite of variability in structures between examples, they have managed to invent a batched algorithm.
They use this unusual architecture for solving the natural language inference problem.