- Proposes a novel, end-to-end architecture for generating short email responses.
- Single most important benchmark of its success is that it is deployed in Inbox by Gmail and assists with around 10% of all mobile responses.
- Link to the paper.
Challenges in deploying Smart Reply in a user-facing product
- Responses must always be of high quality. Ensured by constructing a target response set to select responses from.
- The likelihood of choosing the responses must be maximised. Ensured by normalising the responses and enforcing diversity.
- The system should not add latency to emails. Ensured by using a triggering model to decide if the email is suitable to undergo the response generation pipeline. Computation time is further reduced by finding approximate best result instead of the best result.
- Ensure privacy by encrypting all the data which adds challenge in verifying the model's quality and debugging the system.
- Perform actions like language detection, tokenization, sentence segmentation etc on the input email.
- A feed-forward neural network (with embedding layer and 3 fully connected hidden layers) to decide if the input email is suitable for suggesting responses.
- Training set of pairs (o, y) where o is the incoming message and y is a boolean variable to indicate if the message had a response.
- Unigrams, bigrams from the messages.
- Signals like - is the recipient in the contact list of the sender.
- LSTM network to predict the approximate best response for an incoming message o
- Sequence to Sequence Learning.
- Reads the input message (token by token) and encode a vector representation.
- Compute softmax to get the probability of first output token given the input token sequence.
- Keep feeding in the previous response tokens and the input token sequence to compute the probability of next output token.
- During inference, approximate the most likely response greedily by taking the most likely response at each timestamp and feeding it back or by using the beam search approach.
- Generate a set of high-quality responses that also capture the variability in the intent of the response.
- Canonicalize the email response by extracting the semantic structure using a dependency parser.
- Partition all response messages into "semantic" clusters.
- These semantic clusters define the response space for scoring and selecting possible responses and for promoting diversity among the responses.
- Since a large, labelled dataset is not available, a graph based, semi-supervised approach is used.
- Manually define a few clusters with a small number of example responses for each cluster.
- Construct a graph with frequent response messages (including the labelled nodes) as response nodes (VR).
- For each response node, extract a set of feature nodes (VF) corresponding to features like skip-gram and n-grams and add an edge between the response node and the feature node.
- Learn a semantic labelling for all response nodes by propagating semantic intent information (available because of labelled nodes) throughout the graph.
- After some iterations, sample some of the unlabeled nodes from the graph, manually label these sample nodes and repeat this algorithm until convergence.
- For validation, extract the top k members of each cluster and validate the quality with help of human evaluators.
- Provide users with a varied set of response by omitting redundant response (by not selecting more than one response from any semantic cluster) and by enforcing negative (or positive) responses.
- If the top two responses contain at least one positive (negative) response and none of the top three responses is negative (positive), the third response is replaced with a negative (positive) one.
- This is done by performing a second LSTM pass where the search is restricted to only positive (or negative) responses in the target set.
- The system is already in production and assists with around 10% of all mobile responses.